htcondenser.dagman module

DAGMan class to handle DAGs in HTCondor.

class htcondenser.dagman.DAGMan(filename='jobs.dag', status_file='jobs.status', status_update_period=30, dot=None, other_args=None)[source]

Bases: object

Class to implement DAG, and manage Jobs and dependencies.

Parameters:
  • filename (str) – Filename to write DAG jobs. This cannot be on /users, must be on NFS drive, e.g. /storage.
  • status_file (str, optional) – Filename for DAG status file. See https://research.cs.wisc.edu/htcondor/manual/current/2_10DAGMan_Applications.html#SECTION0031012000000000000000
  • status_update_period (int or str, optional) – Refresh period for DAG status file in seconds.
  • dot (str, optional) – Filename for dot file. dot can then be used to generate a pictoral representation of jobs in the DAG and their relationships.
  • other_args (dict, optional) – Dictionary of {variable: value} for other DAG options.
JOB_VAR_NAME

str

Name of variable to hold job arguments string to pass to condor_worker.py, required in both DAG file and condor submit file.

JOB_VAR_NAME = 'jobOpts'
add_job(job, requires=None, job_vars=None, retry=None)[source]

Add a Job to the DAG.

Parameters:
  • job (Job) – Job object to be added to DAG
  • requires (str, Job, iterable[str], iterable[Job], optional) – Individual or a collection of Jobs or job names that must run first before this job can run. i.e. the job(s) specified here are the parents, whilst the added job is their child.
  • job_vars (str, optional) – String of job variables specifically for the DAG. Note that program arguments should be set in Job.args not here.
  • retry (int or str, optional) – Number of retry attempts for this job. By default the job runs once, and if its exit code != 0, the job has failed.
Raises:
  • KeyError – If a Job with that name has already been added to the DAG.
  • TypeError – If the job argument is not of type Job. If requires argument is not of type str, Job, iterable(str) or iterable(Job).
check_job_acyclic(job)[source]

Check no circular requirements, e.g. A ->- B ->- A

Get all requirements for all parent jobs recursively, and check for the presence of this job in that list.

Parameters:job (Job or str) – Job or job name to check
Raises:RuntimeError – If job has circular dependency.
check_job_requirements(job)[source]

Check that the required Jobs actually exist and have been added to DAG.

Parameters:

job (Job or str) – Job object or name of Job to check.

Raises:
  • KeyError – If job(s) have prerequisite jobs that have not been added to the DAG.
  • TypeError – If job argument is not of type str or Job, or an iterable of strings or Jobs.
generate_dag_contents()[source]

Generate DAG file contents as a string.

Returns:DAG file contents
Return type:str
generate_job_requirements_str(job)[source]

Generate a string of prerequisite jobs for this job.

Does a check to make sure that the prerequisite Jobs do exist in the DAG, and that DAG is acyclic.

Parameters:job (Job or str) – Job object or name of job.
Returns:Job requirements if prerequisite jobs. Otherwise blank string.
Return type:str
Raises:TypeError – If job argument is not of type str or Job.
generate_job_str(job)[source]

Generate a string for job, for use in DAG file.

Includes condor job file, any vars, and other options e.g. RETRY. Job requirements (parents) are handled separately in another method.

Parameters:job (Job or str) – Job or job name.
Returns:name – Job listing for DAG file.
Return type:str
Raises:TypeError – If job argument is not of type str or Job.
get_jobsets()[source]

Get a list of all unique JobSets managing Jobs in this DAG.

Returns:name – List of unique JobSet objects.
Return type:list
submit(force=False, submit_per_interval=10)[source]

Write all necessary submit files, transfer files to HDFS, and submit DAG. Also prints out info for user.

Parameters:
  • force (bool, optional) – Force condor_submit_dag
  • submit_per_interval (int, optional) – Number of DAGMan submissions per interval. The default 10 every 5 seconds.
Raises:

CalledProcessError – If condor_submit_dag returns non-zero exit code.

write()[source]

Write DAG to file and causes all Jobs to write their HTCondor submit files.