Bioclinical Data Services Inc.

 

home
services
why us?
contact us
validation
SAS tips


Bioclinical Data Services Inc.
1933 Coltrane Place, Escondido, CA 92027
,
(760) 738-4958

 


Validation

Are you ready to risk the outcome of a clinical trial without an outside independent evaluation?  At BCDS, we can provide complete validation of SAS software, data, data set creation and algorithms used to calculate efficacy and other key parameters in a trial.

Here is a small sample of problems that we encounter in a typical clinical trial:

Myth #1 A reliable method of validating a data listing consists of the following: Select five patients at random and compare their output against the raw data.

Assuming you only had sufficient time to peruse the data for five patients, you would definitely NOT choose them at random. Effective testing is the process of attempting to find errors. You are more likely to find inaccuracies with a test plan that actively seeks out "problem" cases.  For example, the following records would be good candidates for inspection:

  1. If the report is sorted on a "by" variable such as Investigator; scan the first or last record in that "by" group.  Many SAS coding errors occur on boundaries due to faulty "if first." or "if last." processing.
  2. Search records that contain very large or small values for a particular field.  The analyst may not have allocated sufficient space resulting in an overflow error.  Similarly, rounding errors can be problematic for very small values.
  3. Be wary of records that span more than one page.  Titles and/or footnotes may be lost and column overflow may create a page that is difficult to interpret.  
  4. Inspect the first and last records in the report.  Improper initialization can result in faulty processing for the first record.  Likewise, faulty placement of "eof" labels and "end" SAS constructs can easily cause the last record to be dropped from the report.

Myth #2 The most effective form of validating reports is by having another member of the SAS team validate the output.

Usually, this is not effective for the following reasons:

  1. Most analysts abhor testing, and thus lack the motivation to test effectively.
  2. Some analysts are much more effective at it than others.  This may relate to training, motivation, or even having a "knack" for testing.
  3. Co-workers who are friends are not likely to do rigorous testing.  And the corollary, "competitive" individuals may spend an inordinate amount of time trying to find a bug.
  4. When you place more than one task on an analyst's plate, testing has historically been given a low priority.  Hence, it typically does not get the attention it deserves.
  5. Team members may not have the time or energy to study the code.

Myth #3 Validating each report independently is based on good modular programming foundations.

Many of the errors encountered in clinical trials results from  programs that do not communicate properly with each other.  Hence, two programmers may each create their own definitions of an event.  Comparing the reports externally (i.e. visual inspection) may or may not determine the potential error.  For example, consider the following:  Analyst #1 writes the adverse event detail listing and calculates duration of adverse event as: (end date - start date).  Analyst #2 writes the adverse event summary report and calculates the duration of an adverse event as (end date - start date + 1 ).  Since both formulas have been used in clinical trials, each analyst thinks that their definition is the correct one.  However, the lack of inter-program communication will cause the mean duration of an adverse event to differ by one.

Myth #4 Reports need not be re-validated if there are only minor changes in the data.

Too often, code is written to accommodate the format of the original data, rather than writing code in a more general way to handle changes in the data.  However, minor changes in the data can have dramatic changes in output.  For example, an analyst may develop a format (i.e. via PROC FORMAT) that reflects the various conditions of "severity."  Thus, they may develop a format with values of "1", "2" and "3" to respectively represent "Mild", "Moderate" and "Severe".  However, the database is updated and a value of "4" which represents "Life Threatening" is added to the database.  By using fixed formats, this new critical format may be dropped from the calculations.   

Myth #5 There is no need to validate the transmission of data from one system to another because electronic transfer is for all practical purposes quite accurate.

While the second part is true that data transfer is usually quite reliable, differences in internal structures of databases can result in nasty surprises.  For instance, one database may be more permissive in the way dates can be stored.  If SAS does not have an equivalent date format, it will most likely set the date to missing, WITHOUT warning.  Another common example is that SAS version 6.12 (and earlier releases) can only handle data strings whose length is 200 characters.  Many databases permit storage of greater lengths and hence truncation will occur when transferred to SAS data sets.