htcondenser.dagman module¶
DAGMan class to handle DAGs in HTCondor.
-
class
htcondenser.dagman.
DAGMan
(filename='jobs.dag', status_file='jobs.status', status_update_period=30, dot=None, other_args=None)[source]¶ Bases:
object
Class to implement DAG, and manage Jobs and dependencies.
Parameters: - filename (str) – Filename to write DAG jobs. This cannot be on /users, must be on NFS drive, e.g. /storage.
- status_file (str, optional) – Filename for DAG status file. See https://research.cs.wisc.edu/htcondor/manual/current/2_10DAGMan_Applications.html#SECTION0031012000000000000000
- status_update_period (int or str, optional) – Refresh period for DAG status file in seconds.
- dot (str, optional) – Filename for dot file. dot can then be used to generate a pictoral representation of jobs in the DAG and their relationships.
- other_args (dict, optional) – Dictionary of {variable: value} for other DAG options.
-
JOB_VAR_NAME
¶ str
Name of variable to hold job arguments string to pass to condor_worker.py, required in both DAG file and condor submit file.
-
JOB_VAR_NAME
= 'jobOpts'
-
add_job
(job, requires=None, job_vars=None, retry=None)[source]¶ Add a Job to the DAG.
Parameters: - job (Job) – Job object to be added to DAG
- requires (str, Job, iterable[str], iterable[Job], optional) – Individual or a collection of Jobs or job names that must run first before this job can run. i.e. the job(s) specified here are the parents, whilst the added job is their child.
- job_vars (str, optional) – String of job variables specifically for the DAG. Note that program arguments should be set in Job.args not here.
- retry (int or str, optional) – Number of retry attempts for this job. By default the job runs once, and if its exit code != 0, the job has failed.
Raises: KeyError
– If a Job with that name has already been added to the DAG.TypeError
– If the job argument is not of type Job. If requires argument is not of type str, Job, iterable(str) or iterable(Job).
-
check_job_acyclic
(job)[source]¶ Check no circular requirements, e.g. A ->- B ->- A
Get all requirements for all parent jobs recursively, and check for the presence of this job in that list.
Parameters: job (Job or str) – Job or job name to check Raises: RuntimeError
– If job has circular dependency.
-
check_job_requirements
(job)[source]¶ Check that the required Jobs actually exist and have been added to DAG.
Parameters: job (Job or str) – Job object or name of Job to check.
Raises: KeyError
– If job(s) have prerequisite jobs that have not been added to the DAG.TypeError
– If job argument is not of type str or Job, or an iterable of strings or Jobs.
-
generate_dag_contents
()[source]¶ Generate DAG file contents as a string.
Returns: DAG file contents Return type: str
-
generate_job_requirements_str
(job)[source]¶ Generate a string of prerequisite jobs for this job.
Does a check to make sure that the prerequisite Jobs do exist in the DAG, and that DAG is acyclic.
Parameters: job (Job or str) – Job object or name of job. Returns: Job requirements if prerequisite jobs. Otherwise blank string. Return type: str Raises: TypeError
– If job argument is not of type str or Job.
-
generate_job_str
(job)[source]¶ Generate a string for job, for use in DAG file.
Includes condor job file, any vars, and other options e.g. RETRY. Job requirements (parents) are handled separately in another method.
Parameters: job (Job or str) – Job or job name. Returns: name – Job listing for DAG file. Return type: str Raises: TypeError
– If job argument is not of type str or Job.
-
get_jobsets
()[source]¶ Get a list of all unique JobSets managing Jobs in this DAG.
Returns: name – List of unique JobSet objects. Return type: list
-
submit
(force=False, submit_per_interval=10)[source]¶ Write all necessary submit files, transfer files to HDFS, and submit DAG. Also prints out info for user.
Parameters: - force (bool, optional) – Force condor_submit_dag
- submit_per_interval (int, optional) – Number of DAGMan submissions per interval. The default 10 every 5 seconds.
Raises: CalledProcessError
– If condor_submit_dag returns non-zero exit code.