PrintClose
Question
How did the current concepts about "data quality" develop?
 
Answer

Waste programs depend on analytical methods to detect and quantify low or trace contaminant concentrations in very complex matrices, such as soils, sediments, waste materials, and natural waters. No such technologies existed early in EPA’s history. They were developed and standardized in the 1970s and 1980s. It was expected then that site data would be more reliable if analytical procedures were more sophisticated, and their operation was standardized. This concept became codified through EPA guidance, state regulations, and lab certification programs that laid down strict requirements for which analytical methods were acceptable and how they were implemented. "Data quality" was defined in terms of these strict guidelines. "Definitive" and "screening" levels of data quality were defined according to the rigor of the analytical method and associated QC.

Although this first-generation data quality model made sense at the time, practitioners discovered that it had fatal flaws. One critical oversimplification is the fact that contaminants are heterogeneously distributed at both smaller and larger spatial scales throughout environmental matrices. Spatial contaminant heterogeneity may be random or be strongly patterned by pollutant release and migration mechanisms. In either case, the existence of spatial heterogeneity makes it difficult to reliably extrapolate the results of tiny 1- or 2-gram analytical samples back to the tons of matrix from which the samples came. Expensive analyses accurate to 2 decimal places on 1-gram samples are not particularly useful if two 1-gram samples from the same sample jar or only 1 foot apart in the field are orders of magnitude different. Heterogeneity also complicates the design of analytical methods: variations in composition and particle size can alter the efficiency of sample preparation procedures. Acceptable performance of a standardized method on idealized matrices such as clean sand and reagent water cannot be presumed to predict equivalent method performance on real-world samples. Enough project experience has accumulated for the environmental community to recognize that generating and interpreting pollutant data is more complicated than originally expected. The first-generation data quality model does not accommodate real-world variability well enough to support efficient projects.