Image showing IGS seal and the words Indiana Geological Survey, a research institute of Indiana University. Click to go to the home page. Click for information about these images.
 
 
 
Learn about Indiana geology. Interact with Geographic Information Systems and view maps. Learn about the Indiana Geological Survey.

Projects > Monitoring Pesticides

Statistical Theory


The primary statistical procedures to be employed in the modelling program are Analysis of Variance (ANOVA) and the t-test. These methods of data analysis have been studied and refined by theoretical and applied statisticians for decades and have been shown to be robust in the face of problems associated with water quality variables (for example, non-normality of population distribution and inequality of sample variances; Montgomery and Loftis, 1987). Any parametric statistical procedure will be highly sensitive to spatial and temporal autocorrelation, however. Random sampling of monitoring locations may prevent problems associated with spatial autocorrelation, but complications due to seasonality (a temporal factor) are more difficult to correct.

In ANOVA the data collected from an individual sampling point has the following basic structure:

Yij = µi + eij (1)

where, in the context of the present study, Yij represents a measure of pesticide occurrence or concentration in the jth monitoring well of the ith population (monitoring subunit), µi is the mean of the ith population and eij is the deviation of Yij from its population mean (assumed to be a random variable having a mean of zero and a variance of 2). An estimate of the population mean is given by:

Yi = Yj/ni (2)

where Yi is an estimate of µi, and an estimate of the population variance is given by:

s2i = ej/(ni - 1) (3)

where s2i is an estimate of 2i. These calculations are made for each of the i populations contained in the monitoring network.

To test whether a particular monitoring subunit has an occurrence or concentration of pesticides above some specified level of tolerance, the following t-ratio will be computed:

t = (Yi - µtol)/SEY (4)

where µtol is a specified average occurrence or concentration that is considered to be a maximum tolerable amount by the State Chemist (or other authority), and SEY is an estimate of the standard deviation of the sampling distribution of the means (a function of the sample size ni and variance si2). The computed t-ratios are compared to ordinates of Student's t-distribution with ni - 1 degrees of freedom and a specified confidence level.

To compare the occurrences and (or) concentrations of pesticides in two monitoring subunits during the same time period, the t-ratio takes the following form:

t = (Y2 - Y1)/SEYp (5)

where Y1 and Y2 are means of two monitoring-subarea samples, and SEYp is a pooled estimate of the standard deviation of the sampling distribution of the means (a function of the variances and sample sizes of the monitoring subunits being compared). The computed t-values can be compared to ordinates of Student's t-distribution with n1 + n2 - 2 degrees of freedom and a specified confidence level.

When comparisons are made over time (namely, to determine any temporal trends) seasonality may influence the results. Ideally, the comparisons will be made using data collected during the same seasons, in which case the same procedure described above would be appropriate for determining if a temporal change in the pesticide occurrence or concentration in ground water has occurred. If, on the other hand, the samples to be compared were not collected during the same seasons then the model changes from (1) to:

Yikj = µi + k + eikj (6)

where Yikj is a measure of pesticide occurrence or concentration in the jth well of the ith monitoring unit during the kth season, µi is the mean of the ith monitoring unit, k is an unknown parameter representing the effect of seasonality on the sample mean, and eikj is a random variable having a mean of zero and a variance of 2. In this situation an estimate of k will need to be made (this will involve an analysis of data collected over a sufficiently long period of time; for example, 2 years) and the appropriate tests would involve calculation of F-ratios and comparison with the ordinates of Snedecor's F- distribution (the details are not presented here but are available in Dixon and Massey [1957]).

Because of the logistical difficulties of achieving a truly random distribution of sampling sites, there is a strong possibility that the sites will exhibit some degree of nonrandom clustering. When a sampling population exhibits such nonrandom clustering, the error term in equations (1) and (6) do not have the desirable property of independence. This phenomenon is referred to as "spatial autocorrelation of the error term," so that calculated values of the standard errors in equations (4) and (5) will be underestimates of the true values if conventional procedures are employed. Resulting t-ratios will be inflated and spurious conclusions will be reached concerning mean levels of concentrations and their differences between monitoring subunits. Cliff and Ord (1975) provide a method for calculating t-ratios in the face of spatially autocorrelated data. Their method, which is employed in the software application we have developed, involves calculation of spatial autocorrelation coefficients associated with each subpopulation. In order to do this, a matrix of weights must be developed which depend upon the distances between sample points within each monitoring subunit. This is because points that are clustered (closely spaced) are likely to be somewhat redundant and similar in water quality, whereas points that are widely spaced are likely to be more independent of one another in their water-quality characteristics. Once a complete set of locational data have been gathered by IDEM and provided to IGS, those data will be subjected to Nearest Neighbor Analysis (Theakstone and Harrison, 1970). The Nearest Neighbor Analysis will allow identification of those monitoring networks that are most likely to exhibit spatial autocorrelation. The locational data will then be used to generate the necessary weighting matrices that must be incorporated into statistical analysis of those nonrandomly distributed monitoring subnetworks.


References

Cliff, A.D., and J.K. Ord, 1975: The comparison of means when samples consist of spatially autocorrelated observations. Environment and Planning, 7A: 725-734.

Dixon, W.J., and F.J. Massey, 1957: Introduction to Statistical Analysis. McGraw-Hill, New York.

Montgomery, R.H., and J.C. Loftis, 1987: The applicability of the t-test for dectecting trends in water quality variables. Water Resources Bulletin, 23(4): 653-666.

Theakstone, W.H., and C. Harrison, 1970: The Analysis of Geographical Data. Heinemann Ltd., London.


Back to APPLICATION FOR VIEWING AND STATISTICAL ANALYSIS

Geology | GIS/Maps | About Us | Bookstore | Interactive Maps | Licensing

IGSInfo@indiana.edu / 812-855-7636

Accessibility Information
Copyright, Map Disclaimer, and Limitation of Warranties and Liability

Copyright © 2001 The Trustees of Indiana University