To evaluate an innovation in computer systems, performance analysts measure execution time or other metrics using one or more standard workloads. The performance analyst may carefully minimize the amount of measurement instrumentation, control the environment in which measurement takes place, and repeat each measurement multiple times. Finally, the performance analyst may use statistical techniques to characterize the data.
Unfortunately, even with such a responsible approach, the collected data may be misleading. This paper shows how easy it is to produce poor (and thus misleading) data for computer systems due to observer effect and measurement bias. Observer effect occurs if data collection perturbs the behavior of the system. Measurement bias occurs when a particular environment in which the measurement takes place favors some configurations over others. This paper demonstrates that observer effect and measurement bias have significant impact on performance and can lead to incorrect conclusions. These effects are large enough to easily mislead a performance analyst. Nevertheless, in our literature survey of recent PACT, CGO, and PLDI papers we found that papers rarely acknowledged or used reliable techniques to avoid observer effect or measurement bias.
We describe and demonstrate techniques that help a performance analyst identify situations when they have poor quality data. These techniques are based on causality analysis and statistics which natural and social sciences routinely use to avoid the observer effect and measurement bias. poor-quality data, and discusses two techniques that can help to identify poor-quality data.