Trial Mining

Problem
Generating representative trials is important for ensuring validity, but it is not always possible to generate a trial given specific experimental factor levels such as size, complexity, or density. In other words, the metrics used to characterize a trial may be descriptive rather than generative, and determining how to use them to generate specific trials is too complex or time-consuming.

Solution
Instead of generating a specific trial from parameters, generate a large number (tens of thousands) of entirely random trials and calculate the factor metrics for each random trial. The descriptive statistics for all of these generated trials and their calculated metrics will give you an idea of important metrics, their data distribution, and their relevant levels (also see Factor Mining).

Once the factor levels have been determined (as intervals for each metric), you can search your database of random trials and pick trials that meet your criteria. To avoid inadvertently picking outlier trials, consider selecting trials who fall within a specific confidence (e.g., 0.7) for each metric.

Consequences
Using Trial Mining means that you give up the ability to generate a representative trial for a specific experimental condition, and instead select from a database of randomly generated trials. You may have to generate a lot of random trials, the absolute majority of which you will discard and never use. All unused trials represent wasted time and effort. This pattern also hinges on being able to generate an unlimited number of random trials, which is not possible for all domains (such as text, images, audio, etc).

Examples
The Trial Mining pattern was used in a recent study on perception of animated node-link diagrams of dynamic graphs (Ghani et al. 2012), i.e., graphs that change over time. It was unclear what constituted a representative trial for dynamic graphs, so in an initial study, a large number of trials (240,000) were generated. The different graph metrics (node speed, degree, distance, etc) were calculated for each trial, and when selecting the actual trials to use, trials were picked from within the 0.7 confidence interval for each metric.