Processing

 

 

 

 

 

 

The SCIMS Data Processing System is a group of scripts used to manipulate the raw data produced in the data collection phase. In the course of processing, a number of intermediate and final data products are produced that will be available for view on this page.  These scripts are written for MATLAB and all data processing is done within MATLAB.

 smallwebpageflowchart

There are 9 steps in the system, each labeled with a number in the above flowchart (click here for a larger version of the chart).  The blue items in the chart represent data products available on this website. The steps are as follows:

  1. Raw data conversion: In this step, the data that comes from each ship, in its local format, is converted to a standardized format - the universal format. This is a comma separated text file with 52 columns and contains data types from both SIO and NOAA underway collection systems.  Click here for a list of the data in the universal format.
  2. Preprocessing: Here, a series of tests are run on the data and flags are assigned to each point. The flags are an indication of the point's validity. There are currently 8 tests: impossible date, impossible time, impossible location, location mismatch between primary and secondary lat/long, global range, spike, gradient and flowrate.  If any point fails a test, it is flagged with a 3 (questionable data). If a location or time stamp is bad, the whole row is flagged as a 3, and if the flowrate is below a threshold value, TSG and fluorometer values are flagged as 3s. Note that on certain cruises the flowrate never gets above 1 or 2 LPM. The data for these cruises should be treated as suspect, but the flags are 1s. This allows for data to still be plotted by the system.  When this is the case, there will be some indication on that cruise's web page.
  3. Station data merging: After the data has been converted to the universal format and preprocessed, it must be merged with CalCOFI CTD and bottle data.  This is necessary to cross correlate the data and give a more accurate representation of the oceanographic conditions.  In this step, then, the CTD/Bottle data and the universal underway data are linked by time stamp for each station.
  4. Cross correlation: Once the surface data has been merged, a linear least squares fit is performed for temperature and salinity.  Typically the error for temperature and salinity is very low and the fit is very good (R Squared on the order of > .98). For the fluorescence, a log-linear least squares fit is used.  Due to the inefficiency of continuous fluorometers, the error for the fluorescence fit is moderate.  The fit varies between bad and ok (R Squared ~ .75-.85).  Finally, the Chlorophyll A data is obtained using a 2nd order polynomial in two variables: irradiance (PAR) and fluorescence.  The fit is decent for such a complex system and the residuals are generally randomly distributed.
  5. Correlation plots: Once the correlation has been performed, three plots are made for each variable: predicted vs observed and the fit, residuals vs observed and predicted vs corrected
  6. Application of the correlation coefficients: With the newly created model from station data, the rest of the universal data must be corrected.  This step applies the models to the universal data.
  7. Bin averaging: Once the data has been corrected, it can be reduced in size and smoothed.  This is done by averaging the 30 second resolution data over a period of 10 minutes. This results in about 2800 points per cruise, enough to fit in one file.
  8. Plotting variables on cruise track: The variables are now properly processed.  At this point, we need to be able to visualize the data. For each plot, the cruise track is plotted on a coastline map. This cruise track is about 6 pixels wide and the color of the track at each point is assigned a color. That color is dependent upon the value of the z variable (temperature, salinity, fluorescence or chlorophyll A).  Data that is flagged as invalid or questionable is not plotted.
  9. Meta data extraction: This step is not part of the processing cycle, but it is still necessary.  The meta data from SIO are presented in one file for each day. These files are identical from day to day. The meta data extractor parses one such file and puts it into a universal meta data format.  For NOAA cruises, the meta data must be hand compiled as it exists in a variety of nonstandard formats and files.

Eventually, this system will be upgraded to a more complex database system that allows for both data mining and searching. This will be phase 2.  Additionally, integration with the CalCOFI event numbering will be occurring soon.