htcondenser.job module

Classes to describe individual job, as part of a JobSet.

class htcondenser.job.Job(name, args=None, input_files=None, output_files=None, quantity=1, hdfs_mirror_dir=None)[source]

Bases: object

One job instance in a JobSet, with defined arguments and inputs/outputs.

Parameters:
  • name (str) – Name of this job. Must be unique in the managing JobSet, and DAGMan.
  • args (list[str] or str, optional) – Arguments for this job.
  • input_files (list[str], optional) – List of input files to be transferred across before running executable. If the path is not on HDFS, a copy will be placed on HDFS under hdfs_store/job.name. Otherwise, the original on HDFS will be used.
  • output_files (list[str], optional) –

    List of output files to be transferred across to HDFS after executable finishes. If the path is on HDFS, then that will be the destination. Otherwise hdfs_mirror_dir will be used as destination directory.

    e.g. myfile.txt => Job.hdfs_mirror_dir/myfile.txt, results/myfile.txt => Job.hdfs_mirror_dir/myfile.txt, /hdfs/A/B/myfile.txt => /hdfs/A/B/myfile.txt

  • quantity (int, optional) – Quantity of this Job to submit.
  • hdfs_mirror_dir (str, optional) – Mirror directory for files to be put on HDFS. If not specified, will use hdfs_mirror_dir/self.name, where hdfs_mirror_dir is taken from the manager. If the directory does not exist, it is created.
Raises:
  • KeyError – If the user tries to create a Job in a JobSet which already manages a Job with that name.
  • TypeError – If the user tries to assign a manager that is not of type JobSet (or a derived class).
generate_job_arg_str()[source]

Generate arg string to pass to the condor_worker.py script.

This includes the user’s args (in self.args), but also includes options for input and output files, and automatically updating the args to account for new locations on HDFS or worker node. It also includes common input files from managing JobSet.

Returns:Argument string for the job, to be passed to condor_worker.py
Return type:str
manager

Returns the Job’s managing JobSet.

setup_input_file_mirrors(hdfs_mirror_dir)[source]

Attach a mirror HDFS location for each non-HDFS input file. Also attaches a location for the worker node, incase the user wishes to copy the input file from HDFS to worker node first before processing.

Will correctly account for managing JobSet’s preference for share_exe_setup. Since input_file_mirrors is used for generate_job_arg_str(), we need to add the exe/setup here, even though they don’t get transferred by the Job itself.

Parameters:hdfs_mirror_dir (str) – Location of directory to store mirrored copies.
setup_output_file_mirrors(hdfs_mirror_dir)[source]

Attach a mirror HDFS location for each output file.

Parameters:hdfs_mirror_dir (str) – Location of directory to store mirrored copies.
transfer_to_hdfs()[source]

Transfer files across to HDFS.

Auto-creates HDFS mirror dir if it doesn’t exist, but only if there are 1 or more files to transfer.

Will not transfer exe or setup script if manager.share_exe_setup is True. That is left for the manager to do.