Importing data compiled from Dr. Don Sada of the Desert Research Institute
Over the past year, the Springs Stewardship Institute (SSI) imported an Excel dataset with 2,644 survey records provided by Dr. Sada. These data were compiled over nearly three decades, offering an irreplaceable record of springs ecosystems and springsnail distribution. We were honored to participate in this important project. All data are stored at Springs Online under a project titled, "Sada Import".
Quality control
As a quality control check, our Data Technician, Joseph Holway recently reviewed 52 lines of survey data from the imported dataset that were selected using a random formula in Excel. In each of the 52 surveys, 20 survey data fields from the original spreadsheet provided by Dr. Sada were checked for accuracy of SSI data entry, resulting in 1,040 checked data points. Of the 1,040 points, there were 5 errors, and 1,035 correct entries, indicating that SSI’s migration of data is 99.5% ± 0.5% accurate.
Some of the errors noted were the result of contradictions in the original data set, primarily concerning Pyrgulopsis listed as occurring, yet "absent" or "extirpated" noted in a separate field. We noted some of these records during the import process, but apparently did not find every instance. We assumed that the species were listed on the survey due to their historical occurrence at a site, but were not observed during the survey. These erroneous occurrence records have been deleted, although some may remain. Joseph also found occurrence records for unidentified fish species that did not import properly.
This QAQC procedure did not address accuracy of georeferencing information, which was the first step in the data import process, and which presented the greatest challenge with data migration. Prior to initiation of this project, many springs locations had already been included in the Springs Online database from SSI research, data imports from USGS databases (NHD and Geonames), or reported by land managing agencies or independent researchers. We attempted to match locations from the imported dataset with those already listed in the database.
USGS georeferencing of springs often is inaccurate: spring names are often missing or misspelled, and locations are commonly incorrect. Similarly, GPS location data collected by researchers or agency technicians often are inaccurate for many reasons, and names are often missing or misspelled. GPS readings during the 1990s and 2000s were often unreliable, and UTM recording errors also commonly occurred. Presumably, some locations were derived from USGS topographic maps, requiring a conversion from NAD27 to NAD83. A few GPS coordinates were missing or mapped well outside the study area. IN several instances, site names and identifying numbers among imported data were repeated, but were associated with different GPS coordinates or physical characteristics.
Therefore, errors in both the original data, and in SSI’s interpretation of those data, combined with the tight clustering of springs in some areas, resulted in uncertainty about location match-ups. When in doubt, we imported a new location rather than matching it with one already in the database. While such practice may have resulted in duplicated data, these are easier to resolve than are erroneously applied matches. The most efficient way to resolve location questions would be in a one-on-one meeting with Dr. Sada.
There also were discrepancies in land unit designation between the two datasets. Generally we deferred to state and federal GIS layers for land unit designations; however, such discrepancies also could have resulted from inaccurate location data, or mismatching sites from the imported dataset with sites already reported in the database.