Pipeline with Conda¶

Note

For new development setups, using Pixi directly is recommended. Pixi provides a reproducible, lock-file-based environment that is simpler to maintain than a manually managed Conda environment.

Warning

Running the pipeline data-processing workflow from a modular CASA 6 setup is not officially supported or validated for observatory operations. The information provided here is for development and demonstration purposes only.

Step-by-step¶

Install Miniforge or Micromamba: below we use miniforge3 installer as examples, which only includes the conda-forge channel by default.

#!/bin/bash

# 1. Detect OS and Architecture
OS=$(uname -s | sed 's/Darwin/MacOSX/')
ARCH=$(uname -m)

# 2. Construct the download URL
URL="https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-${OS}-${ARCH}.sh"

# 3. Download the installer
echo "Downloading Miniforge for ${OS}-${ARCH}..."
curl -L "$URL" -o miniforge.sh

# 4. Run the installer (adjust the installation path as needed)
echo "Installing to /opt/miniforge3..."
bash miniforge.sh -b -f -p /opt/miniforge3

# 5. Update and Cleanup
conda update --all
conda clean -a -y
rm miniforge.sh
echo "Installation complete."

Reproduce a Python environment with modular CASA6 components and the dependency libraries required by them and Pipeline, e.g., openmpi.

Fetch source code:

git clone https://open-bitbucket.nrao.edu/scm/pipe/pipeline.git
cd pipeline

Generate environment.yml on demand from pixi (environment.yml is no longer committed to the repository — see Pixi and Running Pixi tasks for details):
```
pixi project export conda-environment \
    | sed 's|^- pip:$|- pip:\n  - --extra-index-url https://casa-pip.nrao.edu/repository/pypi-group/simple|' \
    > environment.yml
```
The sed step injects the NRAO CASA pip index URL, which pixi's export drops by default. If pixi is not yet installed, see Pixi and Running Pixi tasks for the one-line installer.

Create or update the Conda environment using the generated file:

conda env create --name pipeline --file=environment.yml

This will create a Conda environment named pipeline.

Note

To update or remove a pre-existing environment:

conda env update --name pipeline --file=environment.yml           # Update existing
conda env update --name pipeline --file=environment.yml --prune   # Also remove unlisted packages
conda env remove --name pipeline                                  # Remove the environment entirely

Activate the environment and verify the CASA6 software stack installation:

conda activate pipeline
# Create the default CASA data directory if it doesn't exist (can be customized later)
mkdir -p ~/.casa/data
# Verify casatools installation/functionality; CASA data could be fetched from internet if not present locally by casaconfig
# Also see: https://casadocs.readthedocs.io/en/stable/api/casaconfig.html
python -c "import casatools; print('casatools version:', casatools.version_string())"

Install Pipeline:
```
pip install .
```
To install Pipeline along with optional dependencies for developmental and experimental purposes in editable mode, try:
```
pip install -e .[dev,docs]
```
Our ReadtheDocs setup of Pipeline uses this approach for documentation builds (see .readthedocs.yaml)
Note

pyproject.toml and requirements.txt
- environment.yml is no longer committed to the repository. Generate it on demand with pixi project export conda-environment as shown above (see Pixi and Running Pixi tasks). Its purpose is to define a self-contained Python environment with all CASA6 components and dependencies required by Pipeline.
- pyproject.toml handles Pipeline packaging and build system requirements.
- A separate requirements.txt handles Pipeline core/functional dependencies. The separation is intentional for balancing different needs / use cases, e.g. monolithic and modular CASA6 builds, developer/testing installation setups, etc.

Run Pipeline ¶

Typical use patterns of Pipeline include running within a headless environment, or on a workstation interactively, either in CASA serial or parallel mode:

For an interactive use case, one could simply run this to start a casashell session:

conda activate pipeline
python -m casashell

For headless sessions to execute automated Pipeline data processing:

conda activate pipeline
xvfb-run -a python -m casashell --nologger --log2term --agg -c run_pipeline.py

Here run_pipeline.py is a Python script. Example content could be:

import pipeline.recipereducer, os
pipeline.recipereducer.reduce(vis=['../rawdata/uid___A002_Xc46ab2_X15ae_repSPW_spw16_17_small.ms'],
                              procedure='procedure_hifa_calimage.xml', loglevel='debug')

or alternatively:

import pipeline
pipeline.initcli()

context = h_init()
context.set_state('ProjectStructure', 'recipe_name', 'hifa_calimage')
try:
    hifa_importdata(vis=['uid___A002_Xc46ab2_X15ae_repSPW_spw16_17_small.ms'], session=['default'], dbservice=True)
    hifa_flagdata()
    hifa_fluxcalflag()
    hif_rawflagchans()
    hif_refant()
    h_tsyscal()
    hifa_tsysflag()
    hifa_tsysflagcontamination()
    hifa_antpos()
    hifa_wvrgcalflag()
    hif_lowgainflag()
    hif_setmodels()
    hifa_bandpassflag()
    hifa_bandpass()
    hifa_spwphaseup()
    hifa_gfluxscaleflag()
    hifa_gfluxscale()
    hifa_timegaincal()
    hifa_renorm(createcaltable=True, atm_auto_exclude=True)
    hifa_targetflag()
    hif_applycal()
    hif_makeimlist(intent='PHASE,BANDPASS,AMPLITUDE')
    hif_makeimages()
    hif_makeimlist(intent='CHECK', per_eb=True)
    hif_makeimages()
    hifa_imageprecheck()
    hif_checkproductsize(maxcubesize=40.0, maxcubelimit=60.0, maxproductsize=500.0)
finally:
    h_save()

Below are some examples of more detailed managed ways to run the Pipeline.

Serial¶

A plain Python session without invoking casashell:
```
PYTHONNOUSERSITE=1 OMP_NUM_THREADS=4 OPENBLAS_NUM_THREADS=4 xvfb-run -a python ../scripts/run_pipeline.py
```
Here we isolate the user site-packages by setting the PYTHONNOUSERSITE environment variable to 1 to avoid potential package conflicts. We also set OMP_NUM_THREADS and OPENBLAS_NUM_THREADS to control the number of threads used by OpenMP/OpenBlas-enabled libraries (e.g., numpy, scipy, casatools, etc.) during the Pipeline processing.

A session via casashell, with CASA6 logging and plotting enabled:

PYTHONNOUSERSITE=1 OMP_NUM_THREADS=4 OPENBLAS_NUM_THREADS=4 xvfb-run -a \
    python -m casashell --nologger --log2term --agg -c ../scripts/run_pipeline.py

Parallel¶

A standard Python session with casashell invoked:
```
PYTHONNOUSERSITE=1 OMP_NUM_THREADS=1 OPENBLAS_NUM_THREADS=1 \
    mpirun --mca btl_vader_single_copy_mechanism none -x OMP_NUM_THREADS -x OPENBLAS_NUM_THREADS -x PRTE_MCA_quiet -np 4 \
    python -c "import casampi.private.start_mpi; exec(open('run_pipeline.py').read())"
```
casampi.private.start_mpi

As discussed/examined in CAS-14037, casashell include some configuration-dependent (modular vs. monolithic) environment initialization to help casampi set up the client and server roles for different openmpi processes while avoid circular imports during the casampi process initialization. Without casashell involvement, you need to execute casampi.private.start_mpi outside the scope of casatasks (casashell implicitly import casatasks in monolithic casa distributions). As a workaround, include the following boilerplate command at the beginning of your workflow script.
try: import casampi.private.start_mpi # assign the client and server roles import casatasks # ensure the time-based logfile name except (ImportError, RuntimeError) as error: pass
Alternatively, as the above example shows, prepend them into a one-liner command with the -c option of the python executable. If you run a parallel CASA session without going through casashell (e.g., `mpirun -n 4 python run_script.py`), place the code snippet above at the beginning of your Python script before any casatasks import actions to avoid deadlocks.

The consequence of not doing so is that all openmpi processes will be initialized in the same way and instructed to execute the content of your script concurrently, without the expected 1 x mpiclient + (nproc-1) x mpiserver configuration.
A session via casashell, with CASA6 logging and plotting enabled:
```
PYTHONNOUSERSITE=1 OMP_NUM_THREADS=1 xvfb-run -a \
mpirun -display-allocation -display-map -oversubscribe --mca btl_vader_single_copy_mechanism none -x OMP_NUM_THREADS -n 4 \
        python -c "import casampi.private.start_mpi; import casashell.__main__" --nologger --log2term --agg -c ../scripts/run_pipeline.py
```
If you run a parallel CASA session with casashell, you need to add the code snippet inside ~/.casa/config.py. Failure to do so will result in a deadlock the first time casatasks is imported. Note that we use python -c "import casampi.private.start_mpi; import casashell.__main__" instead of python -m casashell so that start_mpi runs before casatasks is imported. the first time casatasks is imported in a MPI server process, it will attempt to start a new MPI environment, leading to a deadlock situation.

Running a parallel CASA session from macOS

A parallel Pipeline data processing session might hang on macOS at the completion of the job due to lingering casaplotms.app sub-processes. This behavior appears to be different from Linux, potentially caused by the fact that each casaplotms process spawned from a MPIserver process runs as a macOS "app". Although this doesn’t affect the data processing, to ensure a clean exit, one might need to use the following snippet at the end of your Python job script:

def close_plotms_on_mpiservers():
    try:
        from casampi.MPIEnvironment import MPIEnvironment
        from casampi.MPICommandClient import MPICommandClient
        client = MPICommandClient()
        mpi_server_list = MPIEnvironment.mpi_server_rank_list()
        client.push_command_request('from casaplotms import plotmstool', block=True,
                                    target_server=mpi_server_list)
        rs_list = client.push_command_request('plotmstool.__proc!=None', block=True,
                                              target_server=mpi_server_list)
        servers_with_active_plotms = [rs['server'] for rs in rs_list if rs['ret']]
        if servers_with_active_plotms:
            print(f'servers with active plotms instances: {servers_with_active_plotms}')
            client.push_command_request('plotmstool.__proc.kill()', block=True,
                                        target_server=servers_with_active_plotms)
    except Exception:
        pass

close_plotms_on_mpiservers()

In addition, xvfb-run is not available on macOS, even if xvfb/X11 is installed; therefore, you may not be able to use it for headless sessions. Additionally, to complete a Pipeline processing session requiring casaplotms, one must log in remotely with GUI access. The casaplotms GUI will appear in the desktop environment but cannot be forwarded via X11.

Useful shorthand¶

Useful aliases/shortcuts to emulate monolithic CASA executables:

conda activate pipeline

export casa_omp_num_threads=4
export casa_mpi_nproc=4
export TMPDIR=/tmp

export casa6_opts_custom='--nologger --log2term --agg'
export mpirun_custom='mpirun -display-allocation -display-map -oversubscribe --mca btl_vader_single_copy_mechanism none --mca btl ^openib -x OMP_NUM_THREADS -x PYTHONNOUSERSITE'
export xvfb_run_auto='xvfb-run -a' # Debian, Ubuntu, RedHat8, etc.

alias casa6='PYTHONNOUSERSITE=1 OMP_NUM_THREADS=${casa_omp_num_threads} python -m casashell'
alias casa6mpi='PYTHONNOUSERSITE=1 OMP_NUM_THREADS=1 ${mpirun_custom} -n ${casa_mpi_nproc} python -c "import casampi.private.start_mpi; import casashell.__main__"'

# For Linux only, not applicable on macOS
alias casa6_xvfb='PYTHONNOUSERSITE=1 OMP_NUM_THREADS=${casa_omp_num_threads} ${xvfb_run_auto} python -m casashell'
alias casa6mpi_xvfb='PYTHONNOUSERSITE=1 OMP_NUM_THREADS=1 ${xvfb_run_auto} ${mpirun_custom} -n ${casa_mpi_nproc} python -c "import casampi.private.start_mpi; import casashell.__main__"'

For executing a headless parallel Pipeline processing session on Linux, one could try:

casa6mpi_xvfb ${casa6_opts_custom} -c ../scripts/run_pltest.py

If you prefer running with an 8-core mpicasa session (1 client + 7 servers), you could do:

casa_mpi_nproc=8 casa6mpi_xvfb ${casa6_opts_custom} -c ../scripts/run_pltest.py

Pipeline with Conda¶

Step-by-step¶

Run Pipeline¶

Serial¶

Parallel¶

Useful shorthand¶

Run Pipeline ¶