Environmental Monitoring and Measurement Advisor – A New Expert System
Presented at WTQA 2002, Arlington, VA - August 2002
by
Larry Keith, Instant Reference Sources, Inc., 890 Providence Club Dr., Monroe, GA. 30656.
E-mail: larrykeith@earthlink.net
In the Spring of 2002 a prototype expert system was developed under a National Science Foundation grant to facilitate systematic planning for environmental monitoring. A commercial expert system shell was used to develop the interactive software rather than "re-inventing a wheel." Over one hundred commercial software "shells" were evaluated and the Exsys Company’s Corvid program was purchased and used to develop the Environmental Monitoring and Measurement Advisor (EMMA). In addition to powerful attributes of the Corvid software that facilitate programming with complex decision rules, an attractive feature is the ability to use Windows-based frames in which additional information can be provided. This feature was used extensively in the development of EMMA as a way to enhance the technical transfer capabilities of the product by providing the user with guidance and in-depth information about the questions being asked and the answers given.
The expert system was designed for two types of users: (1) busy executives and project managers who want the best answers and advice as quickly as possible and, (2) students and less experienced practitioners who want to learn more about the complex subject of environmental monitoring. The final task involved an independent beta test and review of the product by less experienced students and also experienced faculty and staff at Tennessee Technological University. This was done using a seminar in which students and faculty involved in environmental analysis were invited to a hands-on workshop using EMMA. A resounding 100% of the reviewers who responded said that they learned something new using EMMA and 100% also said that the information provided was clear and helpful.

Figure 1. User Interface for EMMA
An innovative approach uses two frames with the interactive software system: the left frame occupies 65% of the viewer’s screen and contains the expert system while the right frame occupies 35% of the screen and contains the ancillary information. The right hand frame is referred to as the "Explanation Box." The source code used for the Explanation Box is entirely HTML and this enables the addition of many useful features such as internal and external hyperlinks and "pop up" boxes. The pop up boxes appear when the user’s cursor hovers over a hyperlinked word or phrase; they contain definitions, clarifications, and additional instructions. Figure 1 illustrates a typical view with this user interface and it also includes an example of a pop up definition of "systematic errors" within the Explanation Box.
A unique feature incorporated into the software is three different "planes" of information content. The top level "Executive Plane" is designed to give busy executives and project managers advice and answers for their site-specific needs as quickly as possible. The second level "Educational Plane" provides explanations and ancillary information to help less experienced users to understand what is being asked of them, why it is being asked, and what the consequences of their actions may be. The third level "Research Plane" is reached by internal and external hyperlinks from the second plane and it contains extensive in-depth discussions of selected topics as well as links to other web sites such as those of US Environmental Protection Agency, the interagency Methods and Data Comparability Board, the American Chemical Society Division of Environmental Chemistry, etc. Figure 2 illustrates this concept. This innovative approach provides a solid foundation for the production of the Anti-Terrorism Responder (the primary product) and four other environmentally-based expert systems that provide key components for it.

Figure 2. Planes of Information
The expert system has three modules, the first for making sampling and QA/QC decisions, the second for selecting methods (based on information in the National Environmental Methods Index – NEMI), and the third for calculating how many samples will be needed to meet your objectives. Thus, all of the steps in EPA’s Data Quality Objective (DQO) Process are incorporated but they are used at different stages (i.e., not in the 7 step sequence of the DQO Process) because when planning a monitoring program that uses information in a systematic manner some of the information is needed at different times from the DQO Process in order to make logical decisions. Primary and Secondary questions were developed for the software and examples of these are shown below.
Examples of Primary Questions
The primary questions revolved around the seven steps of US EPA's Data Quality Objective (DQO) Process. The types of questions included in the expert system included those such as described below:
Step 1. State the Problem. The first question posed to a user is:
"Step 1 - Please state your objectives in the dialog box below - we'll review these again at the end of this systematic planning process to see if the results based upon your input to this expert system match your objectives. If you need help to state your objective(s) in a clear, concise, and measurable way, please review the information provided in the Explanation Section; it contains hyperlinks to additional detailed advice and examples."
Step 2. Identify the Decisions. The question asked in relation to how the user plans to use the information is in two parts. First the user is asked to broadly categorize the kind of decision and secondly a dialog box is provided to verbalize how the data will be used to make decisions. These questions are listed below:
"Step 3. Determing What Decisions Will be Made. Environmental analyses are conducted for a purpose. In other words, one or more decisions will be made based on the analytical results of the project. Here we will consider the general kinds of decisions that will be made; next we will consider the consequences of these decisions being wrong. Which of the general categories will your decisions fall into? Select all that apply."
The choices are:
and more than one choice can be selected. Next the second part of the question is presented using a dialog box to input the second answer:
"Step 3A. It is important that sampling plans clearly state how the data will be used as information for making decisions so that there is an understanding, in the context of gathering data by those working on the mission, of what will be relevant and what may not be relevant."
Step 3. Identify Inputs to the Decisions. Primary questions used in the expert system included asking the user what analytes need to be analyzed for and the associated action levels with the decisions, what detection levels need to be achievable with each analyte, etc. The questions involved in this portion are listed below:
Step 5B. The next step is to define exactly what you plan to analyze for. Analytical methods are very specific in this respect. The term "analyte" is used in EMMA to refer to any kind of target chemical (or biological organism) that you will be analyzing for. We will be using the National Environmental Methods Index (NEMI) as our primary tool to select the best method(s) to meet your objectives and provide data for your decisions. Some methods only are designed to analyze for one analyte at a time and others will cover over a hundred analytes. After reviewing the information in the Explanation Box please select all analyte subcategories that apply to your target analytes.
The current seven major categories in NEMI are listed as choices and the user may select more than one. The categories include General Methods, Sampling and Preparation Methods, Organic Methods, Inorganic Methods, Radiochemical Methods, Physical Methods, and Microbiological Methods. Following the user's selection the next question posed is more specific and allows input of specific analytes into a dialog box for further consideration:
Now that the analyte subcategories have been defined please list the specific analytes that you will be analyzing for. Since methods are specific for target analytes this list will be used when you search NEMI (or any other source) to determine if a method is appropriate for your use or whether it can be modified to make it acceptable for your use (under performance based measurement system - PBMS) guidelines.
Other questions involve the users' requirements with respect to detection levels, method specificity, accuracy, precision, and instrumentation either preferred or not wanted. Examples of these questions are listed below:
Method sensitivity has both specific and relative components. You must decide what concentration levels you need to achieve in order to meet your stated objectives. This is an important decision that affects not only the methods that will be available to choose from but also the numbers of samples you may have to analyze in order to achieve the confidence levels you desire. The more samples you need to analyze, of course, the greater will be your costs. The relative component of this decision refers to the closeness of your desired sensitivity level to the detection levels of the available methods. The closer the desired sensitivity level is to a methods "detection level" the greater is the number of samples (and thus larger costs) that will be required to meet your desired confidence levels. After reviewing the information in the Explanation Box please select which of the sensitivity levels, relative to the available methods, you are likely to need. Realize that, after reviewing the method summaries in NEMI (or any other source) that you may have to change your initial estimate.
Method Selectivity (also referred to as Method Specificity) is an important consideration because it directly affects the probability of detecting interferences in samples - especially in complex environmental samples. Interferences may cause increases or decreases in signals of target analytes and thus lead to false positive and/or false negative conclusions. Thus, your tolerance for false positive and/or false negatives in your data is closely related to sample characteristics and method selectivity. Please choose the relative selectivity you will need when evaluating method choices from NEMI (or any other source).
Accuracy is often considered to be an important characteristic of methods that are chosen to produce data. Bias causes inaccuracy in data and may come from many sources, including procedural sources and from contamination of samples. After reviewing information in the Explanation Box that relates to bias and accuracy and method selections please choose your relative tolerance for accuracy. This factor will be used when you evaluate potential methods from NEMI (or any other source).
Analytical instrumentation may be important in your selection. For example, if particular instrumentation such as ICP-MS is not available then you wouldn't want to consider methods that use that instrument. On the other hand, you may have a preference for some particular instrumentation because it is available, you feel it will meet your needs, you are familiar with it, etc. If you have either instrumentation that you want to avoid or instrumentation that you have a preference for please list either or both in the dialog box below. If you have no preference for instrumentation and are willing to consider all available methods, regardless of instrumentation, then type the statement, "Any instrumentation is acceptable."
Step 4. Define the Boundaries of the Study. Primary questions used in the expert system included asking the user to define the spatial and temporal limits of the physical site. However, an important improvement in this process is the inclusion of available resources (i.e., budget) in the definition of boundaries. This improvement was actually suggested by an EPA employee (who was referenced in the Explanation Box associated with this question).
The question associated with this portion of the expert system is:
Step 2 - C. Another boundary involves the amount of resources you have. Please state the budget for both sampling and analysis activities, INCLUDING all QA/QC sampling and analysis, that is available for this project. Add the two sub-budgets together to obtain the TOTAL budget and enter this information in the space below. (You may enter $ commas, words and numbers, etc.). Note: please open the "Checklist for EMMA" file in the Explanation Box now (to close "Checklist for EMMA" press the "Back Arrow" on your browser.) Print out this checklist (press the "control" plus letter "p" keys simultaneously to print) as we will be entering other data in it and then enter your budget into the answer line of Question #1 in the checklist.
The checklist referred to in this question is one developed as a generic worksheet to which users can record important decisions that will affect the cost of the project. This checklist is used to reconcile the estimated cost of a project (after going through the systematic planning process) with the resources that are actually available. When the estimated cost doesn't match the available budget then the user must make changes to the plans (usually resulting in decreased samples for analysis and, therefore, decreased confidence levels in the data). This checklist is a document that can be downloaded and printed from within the expert system and it is included as Appendix B in Section 7.0 of this final report.
Step 5. Develop Decision Rules. Most of the decision rules were built into the expert system. Once the user selected a choice presented to him or her then the expert system provides an appropriate answer based on IF/THEN rules. An example is provided below.
Step 2A - Media Choices. The boundaries of the area under investigation must be defined, i.e., deciding where the samples are to be taken. A coincidental consideration of where to collect samples is what to sample (for example, soil, industrial effluent, rain water, air, leaves, birds, etc.). There are five categories of samples to consider: (1) water, (2) Soil/Sediment, (3) Air, (4) Aquatic Biota, and (5) Other Biota. Please check all categories that apply. The following step(s) will consider typical matrices that are sampled within each of these five categories.
For example, IF the user selects Water THEN a list of seven different types of water matrices is presented. These include drinking water, ground water, municipal and industrial effluents, rivers and streams, lakes and ponds, etc. IF the user selects "Rivers and Streams" THEN the following advice is provided.
Rivers and streams are often stratified and you will have to take this into consideration when you plan your sampling. There are special techniques for sampling from bridges, from sides of streams, etc. and you will have to determine which, if any, of these you may want to use. In addition, decide whether you want to sample at various depths, at the surface only, or where in this fluid medium your samples will be most representative of the problem you are investigating. Also, timing of sampling for rivers and streams is important and we'll look at those factors shortly.
The user is not confined to only one media or matrix selection; in fact they may all be selected if desired. This would lead to the system providing advice on sampling twenty two specific matrices. In addition to advice provided by the expert system IF/THEN rules additional advice and information is always provided in an associated "Explanation Box."
The associated Explanation Box is shown below in Figure 3.
Explanation
Numbers of Samples Obtained by Statistical Sampling
Remember that samples obtained using a statistical approach may consist of samples obtained from simple random sampling, or quasi-random sampling, or stratified random sampling, or ranked set sampling.
A convenient software tool to estimate numbers of samples needed to meet your statistical objectives is DQO-PRO.
Click Here to open DQO-PRO and then select Enviro-Calc, one of the three programs that comprise DQO-PRO. Click on "Help" within DQO-PRO (upper left on the title bar) to view the assumptions, equations, and references used to make your calculations. The important assumption we are making is that you will be estimating numbers of samples needed based on their average concentration in the samples that you will analyze.To obtain overall precision you will need to analyze field replicate samples. These are two or more portions of a sample collected as close as possible at the same point in time and space so as to be considered identical. These QC samples are also called co-located samples and they are used to measure imprecision caused by inhomogeneity (non-even distribution) of the target analytes distributed in the environmental matrix. If you do not have overall precision data from previous measurements then you must estimate it. In general, it may be assumed that imprecision increases as you go from water to solid matrices (for example soils). Imprecision may also increase with fluid matrices (e.g., water and air) that vary significantly over short periods of time (for example, fast moving streams versus a more static pond or lake or ambient air on a windy day). As imprecision increases, the relative standard deviation (RSD) will also increase and it is not unusual for overall relative standard deviations to be several times those of laboratory values. Thus, for example, a method with a RSD of 20% might have an overall 30% RSD when used to analyze well water samples, 40% RSD when used to analyze samples from a flowing stream, 50% RSD when used to analyze soil samples and 60% RSD when used to analyze ambient air samples.
If, (as often is the case) you find that your budget is not large enough to enable you to obtain the confidence level you desired (this was determined in the first segment of EMMA and recorded as the answer to Question #2 in the Calculation Checklist for EMMA) then you can easily change your inputs and calculate what the confidence level would be based on either a higher tolerance level for error or a lesser number of samples. You do this by keeping the other values constant and changing either the tolerance level or the confidence level until you reach the approximate number of samples that your budget will accommodate.
Enter the number of samples that you plan to obtain using statistical sampling approaches in the answer for Question #4 in the Calculation Checklist for EMMA.
Figure 3. Example of Information Provided in an Explanation Box
Step 6. Specify Tolerable Limits on Decision Errors. Primary questions used in the expert system included asking the user to set acceptable limits for measurement quality objectives (MQOs) such as precision, accuracy, rates of false positive and/or false negative decision errors, etc. and also for confidence levels in the sampling and analytical data that relate to the DQOs. These decision error limits are asked to be set relative to the consequences of exceeding them, i.e., human health effects, ecological health effects, engineering costs, analytical costs, etc. An example of this type of question is:
Precision is often deemed to be an important characteristic in selecting analytical methods. You must determine whether it is or not for your particular needs, as always, remembering that methods that produce higher precision may often be the more expensive ones. After reviewing the information on precision in the Explanation Box please select your tolerance for precision in terms of Relative Standard Deviation (RSD).
Step 7. Optimize the Design for Obtaining Data. Primary questions used in the expert system included asking the user if the answers (in terms of numbers and kinds of environmental samples, numbers and kinds of quality control samples, and costs and data quality associated with selected methods) match time and budget resources available. Using the Calculation Checklist for EMMA (Appendix B) provides a convenient way of summarizing this information and comparing it to a user's real budget.
Examples of Secondary Questions
Secondary question categories involve issues that go deeper into the decision making process. Secondary questions used in the expert system included asking the user for information such as those described below using two examples:
Step 5 B - 1. Estimating the lowest concentration levels you need to analyze at is an extremely important decision because it will affect the available methods to choose from, the rates of false positive and false negative data you will have, your ability to composite samples or not, and the numbers of samples you will have to analyze to meet your data quality objectives. Each of these considerations will be evaluated later (although compositing has already been considered). The lowest value you select will be your detection level (DL) and, typically, the lowest value at which you can reliably measure concentrations of your target analyte(s) will be three or four times your DL. The closer a target analyte's concentration is to a method's DL the greater is the number of samples (and thus larger costs) that will be required to meet the confidence levels (which were determined in the first segment of EMMA) you need. After reviewing the information in the Explanation Box please select which range of sensitivity that you are likely to need. Realize that, after reviewing the method summaries in NEMI (or any other source) that you may have to change your initial estimate.
Step 5E. Analytical instrumentation may be important in your selection. For example, if particular instrumentation such as ICP-MS is not available then you wouldn't want to consider methods that use that instrument. On the other hand, you may have a preference for some particular instrumentation because it is available, you feel it will meet your needs, you are familiar with it, etc. If you have either instrumentation that you want to avoid or instrumentation that you have a preference for please list either or both in the dialog box below. If you have no preference for instrumentation and are willing to consider all available methods, regardless of instrumentation, then type the statement, "Any instrumentation is acceptable."
Educational Uses of EMMA
In mid June a comprehensive one day seminar on the development and use of the expert system was presented at Tennessee Technological University (TTU) and, at the same time, to environmental researchers from surrounding educational institutions, including the University of Tennessee at Chattanooga. Following the lectures, which included all of the material in the PowerPointÔ tutorial, the attendees conducted hands on use of the expert system. They were provided six representative environmental sampling and analysis problems from two real world examples and used the expert system to develop the best solutions for them. The problems involved using the DQO Process and NEMI to find the best methods to use (or to start with in order to modify them using a PBMS approach). At the conclusion of this exercise each attendee was requested to fill out an evaluation form with numerical (quantitative) ratings and descriptive (subjective) comments. The questionnaire contained 7 questions directly related to the objectives of the project:
How to Experience EMMA Yourself
EMMA will be a commercial software program for use by private and government scientists to help save time and money during planning of environmental monitoring projects. However, the method selection module of EMMA is also available on the NEMI web site at www.nemi.gov so that people can use it for selecting the best methods in NEMI for their individual needs. A comment form is also provided on this site so that people can provide input to improve future versions of EMMA or to be added to a list of interested potential customers when the final version is available for sale.