Diffing pipeline execution logs

When refactoring pipeline code, it is common to want to restructure and modify the code but leave the overall behaviour unchanged. It falls to the developer to manually verify that the pipeline behaviour has not changed.

There are three general ways to compare pipeline executions and determine whether pipeline behaviour has changed:

  1. Check whether the same CASA tasks are called in the same sequence and with the same arguments;

  2. In addition to the above, check whether the same sequence of CASA tool commands is invoked;

  3. Check whether the output data are identical (within limits).

This note describes ways the pipeline can assist in scenarios #1 and #2. The comparison of output caltables and images is considered out of scope for this document.

Ordered pipeline execution logs

In short, to compare two pipeline runs, it is simplest to diff the log files for each run. However, the standard CASA command log (casa_commands.log) is of limited use due to the number of false positives raised with diff; there are several unordered task arguments (spw, flag commands, output directories, etc.) which can vary from run to run but actually result in identical data on disk.

The pipeline has a debug mode to facilitate the comparison of pipeline runs. Lowering the log threshold from INFO results enables a debug mode which processes all CASA task calls and CASA tool calls and logs them to a file. The log records are processed to give a stable output; arguments are sorted and variable arguments replaced (UUIDs, directory paths, etc.) so that no/minimal false positives are raised.

The filename of the processed execution log is of the format casacalls-<HOSTNAME>.txt. The hostname is included so that the output of each MPI client can be determined for HPC executions.

Log all CASA task calls

To log all CASA tasks to a file suitable for diff, use a log level of DEBUG, e.g.,

h_init(loglevel='debug')

or if you are using recipereducer,

pipeline.recipereducer.reduce(vis=['uid___A002_X30a93d_X43e'], loglevel='debug')

Log all CASA tool calls

To record all CASA task and CASA tool invocations, use a log level of TRACE, e.g.,

h_init(loglevel='trace')

or if you are using recipereducer,

pipeline.recipereducer.reduce(vis=['uid___A002_X30a93d_X43e'], loglevel='trace')

To restrict log output to specific tools, see Logging.

Example

Scenario: two branches have different code but should result in identical CASA calls. This can be tested by running an execution for both branches, keeping the:

  1. same CASA version

  2. same data

  3. same pipeline recipe

From an initial root directory, run the pipeline procedure for the main branch with

mkdir main && cd main
pixi run -e default casa
import pipeline
pipeline.recipereducer.reduce(vis=['/rawdata/uid___A002_X30a93d_X43e'],
                              procedure='hifacal.xml',
                              loglevel='debug')

From the same initial root directory, run the pipeline procedure for the feature branch:

mkdir branch && cd branch
pixi run -e default casa
import pipeline
pipeline.recipereducer.reduce(vis=['/rawdata/uid___A002_X30a93d_X43e'],
                              procedure='hifacal.xml',
                              loglevel='debug')

You can now diff the output with:

diff main/casacalls-<hostname>.txt branch/casacalls-<hostname>.txt