Experiment Spec#
- class beaker.ExperimentSpec(**data)[source]#
Bases:
BaseModel
Experiments are the main unit of execution in Beaker.
An
ExperimentSpec
defines anExperiment
.- Examples:
>>> spec = ExperimentSpec( ... budget="ai2/allennlp", ... tasks=[ ... TaskSpec( ... name="hello", ... image=ImageSource(docker="hello-world"), ... context=TaskContext(cluster="ai2/cpu-only"), ... result=ResultSpec( ... path="/unused" # required even if the task produces no output. ... ), ... ), ... ], ... )
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
budget:
str
# The name of the budget account for your team. See https://beaker-docs.apps.allenai.org/concept/budgets.html for more details.
-
version:
SpecVersion
# Must be ‘v2’ for now.
- classmethod from_file(path)[source]#
Load an
ExperimentSpec
from a YAML file.- Return type:
- classmethod new(budget, task_name='main', description=None, cluster=None, beaker_image=None, docker_image=None, result_path='/unused', priority=None, **kwargs)[source]#
A convenience method for creating a new
ExperimentSpec
with a single task.- Parameters:
task_name (
str
, default:'main'
) – The name of the task.description (
Optional
[str
], default:None
) – A description of the experiment.cluster (
Union
[str
,List
[str
],None
], default:None
) –The cluster or clusters where the experiment can run.
Tip
Omitting the cluster will allow your experiment to run on any on-premise cluster, but you can only do this with preemptible jobs.
beaker_image (
Optional
[str
], default:None
) – Thebeaker
image name in theimage
source.docker_image (
Optional
[str
], default:None
) –The
docker
image name in theimage
source.Important
Mutually exclusive with
beaker_image
.priority (
Union
[str
,Priority
,None
], default:None
) – Thepriority
of thecontext
.kwargs – Additional kwargs are passed as-is to
TaskSpec
.
- Examples:
- Return type:
Create a preemptible experiment that can run an any on-premise cluster:
>>> spec = ExperimentSpec.new( ... "ai2/allennlp", ... docker_image="hello-world", ... priority=Priority.preemptible, ... )
- with_task(task)[source]#
Return a new
ExperimentSpec
with an additional task.- Parameters:
task (
TaskSpec
) – The task to add.- Examples:
>>> spec = ExperimentSpec(budget="ai2/allennlp").with_task( ... TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.ExperimentSpec` ... docker_image="hello-world", ... ) ... )
- with_description(description)[source]#
Return a new
ExperimentSpec
with a different description.- Parameters:
description (
str
) – The new description.- Examples:
>>> ExperimentSpec(budget="ai2/allennlp", description="Hello, World!").with_description( :rtype: :py:class:`~beaker.data_model.experiment_spec.ExperimentSpec` ... "Hello, Mars!" ... ).description 'Hello, Mars!'
- class beaker.TaskSpec(**data)[source]#
Bases:
BaseModel
A
TaskSpec
defines aTask
within anExperimentSpec
.Tasks are Beaker’s fundamental unit of work.
A Beaker experiment may contain multiple tasks. A task may also depend on the results of another task in its experiment, creating an execution graph.
-
image:
ImageSource
# A base image to run, usually built with Docker.
-
result:
ResultSpec
# Where the task will place output files.
-
context:
TaskContext
# Context describes how and where this task should run.
-
constraints:
Optional
[Constraints
]# Each task can have many constraints. And each constraint can have many values. Constraints are rules that change where a task is executed, by influencing the scheduler’s placement of the workload.
Important
Because constraints depend on external configuration, a given constraints may be invalid or unavailable if a task is re-run at a future date.
-
name:
Optional
[str
]# Name is used for display and to refer to the task throughout the spec. It must be unique among all tasks within its experiment.
-
command:
Optional
[List
[Union
[str
,int
,float
]]]# Command is the full shell command to run as a sequence of separate arguments.
If omitted, the image’s default command is used, for example Docker’s
ENTRYPOINT
directive. If set, default commands such as Docker’sENTRYPOINT
andCMD
directives are ignored.Example:
["python", "-u", "main.py"]
-
arguments:
Optional
[List
[Union
[str
,int
,float
]]]# Arguments are appended to the command and replace default arguments such as Docker’s
CMD
directive.If
command
is omitted, arguments are appended to the default command, Docker’sENTRYPOINT
directive.Example: If
command
is["python", "-u", "main.py"]
, specifying arguments["--quiet", "some-arg"]
will run the commandpython -u main.py --quiet some-arg
.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
resources:
Optional
[TaskResources
]# External hardware requirements, such as memory or GPU devices.
-
leader_selection:
bool
# Enables leader selection for the replicas and passes the leader’s hostname to the replicas.
-
propagate_failure:
Optional
[bool
]# Determines if whole experiment should fail if this task failures.
-
propagate_preemption:
Optional
[bool
]# Determines if all tasks should be preempted if this one task is.
-
synchronized_start_timeout:
Optional
[int
]# If set, jobs in the replicated task will wait to start, up to the specified timeout, until all other jobs are also ready. If the timeout is reached, the job will be canceled. Represented using nanoseconds, must be greater than zero and less than or equal to 48 hours.
- classmethod new(name, cluster=None, beaker_image=None, docker_image=None, result_path='/unused', priority=None, preemptible=None, **kwargs)[source]#
A convenience method for quickly creating a new
TaskSpec
.- Parameters:
cluster (
Union
[str
,List
[str
],None
], default:None
) –The cluster or clusters where the experiment can run.
Tip
Omitting the cluster will allow your experiment to run on any on-premise cluster, but you can only do this with preemptible jobs.
beaker_image (
Optional
[str
], default:None
) –The
beaker
image name in theimage
source.Important
Mutually exclusive with
docker_image
.docker_image (
Optional
[str
], default:None
) –The
docker
image name in theimage
source.Important
Mutually exclusive with
beaker_image
.priority (
Union
[str
,Priority
,None
], default:None
) – Thepriority
of thecontext
.preemptible (
Optional
[bool
], default:None
) – If the task should be preemptible.kwargs – Additional kwargs are passed as-is to
TaskSpec
.
- Examples:
>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... cluster="ai2/cpu-cluster", ... docker_image="hello-world", ... )
- with_image(**kwargs)[source]#
Return a new
TaskSpec
with the givenimage
.- Parameters:
kwargs – Key-word arguments that are passed directly to
ImageSource
.- Examples:
>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_image(beaker="hello-world") >>> assert task_spec.image.beaker == "hello-world"
- with_result(**kwargs)[source]#
Return a new
TaskSpec
with the givenresult
.- Parameters:
kwargs – Key-word arguments that are passed directly to
ResultSpec
.- Examples:
>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_result(path="/output") >>> assert task_spec.result.path == "/output"
- with_context(**kwargs)[source]#
Return a new
TaskSpec
with the givencontext
.- Parameters:
kwargs – Key-word arguments that are passed directly to
TaskContext
.- Examples:
>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_context(cluster="ai2/general-cirrascale") >>> assert task_spec.context.cluster == "ai2/general-cirrascale"
- with_name(name)[source]#
Return a new
TaskSpec
with the givenname
.- Parameters:
name (
str
) – The new name.- Examples:
>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_name("Hi there!") >>> assert task_spec.name == "Hi there!"
- with_command(command)[source]#
Return a new
TaskSpec
with the givencommand
.>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_command(["echo"]) >>> assert task_spec.command == ["echo"]
- with_arguments(arguments)[source]#
Return a new
TaskSpec
with the givenarguments
.>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_arguments(["Hello", "World!"]) >>> assert task_spec.arguments == ["Hello", "World!"]
- with_resources(**kwargs)[source]#
Return a new
TaskSpec
with the givenresources
.- Parameters:
kwargs – Key-word arguments are passed directly to
TaskResources
.- Examples:
>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_resources(gpu_count=2) >>> assert task_spec.resources.gpu_count == 2
- with_dataset(mount_path, **kwargs)[source]#
Return a new
TaskSpec
with an additional inputdataset
.- Parameters:
mount_path (
str
) – Themount_path
of theDataMount
.kwargs – Additional kwargs are passed as-is to
DataMount.new()
.
- Examples:
>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_dataset("/data/foo", beaker="foo") >>> assert task_spec.datasets
- with_env_var(name, value=None, secret=None)[source]#
Return a new
TaskSpec
with an additional inputenv_var
.- Parameters:
- Examples:
>>> task_spec = TaskSpec.new( ... "hello-world", ... docker_image="hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... env_vars=[EnvVar(name="bar", value="secret!")], ... ).with_env_var("baz", value="top, top secret") >>> assert len(task_spec.env_vars) == 2
- with_constraint(**kwargs)[source]#
Return a new
TaskSpec
with the givenconstraints
.>>> task_spec = TaskSpec.new( ... "hello-world", :rtype: :py:class:`~beaker.data_model.experiment_spec.TaskSpec` ... docker_image="hello-world", ... ).with_constraint(cluster=['ai2/cpu-cluster']) >>> assert task_spec.constraints['cluster'] == ['ai2/cpu-cluster']
-
image:
- class beaker.ImageSource(**data)[source]#
Bases:
BaseModel
ImageSource describes where Beaker can find a task’s image. Beaker will automatically pull, or download, this image immediately before running the task.
Attention
One of either ‘beaker’ or ‘docker’ must be set, but not both.
-
docker:
Optional
[str
]# The tag of a Docker image hosted on the Docker Hub or a private registry.
Note
If the tag is from a private registry, the cluster on which the task will run must be pre-configured to enable access.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
docker:
- class beaker.EnvVar(**data)[source]#
Bases:
BaseModel
An
EnvVar
defines an environment variable within a task’s container.Tip
If neither ‘source’ nor ‘secret’ are set, the value of the environment variable with default to “”.
-
name:
str
# Name of the environment variable following Unix rules. Environment variable names are case sensitive and must be unique.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
name:
- class beaker.DataMount(**data)[source]#
Bases:
BaseModel
Describes how to mount a dataset into a task. All datasets are mounted read-only.
See also
This is used in the
TaskSpec.datasets
property inTaskSpec
.-
source:
DataSource
# Location from which Beaker will download the dataset.
-
mount_path:
str
# The mount path is where Beaker will place the dataset within the task container. Mount paths must be absolute and may not overlap with other mounts.
Error
Because some environments use case-insensitive file systems, mount paths differing only in capitalization are disallowed.
-
sub_path:
Optional
[str
]# Sub-path to a file or directory within the mounted dataset. Sub-paths may be used to mount only a portion of a dataset; files outside of the mounted path are not downloaded.
For example, given a dataset containing a file
/path/to/file.csv
, setting the sub-path topath/to
will result in the task seeing{mount_path}/file.csv
.
- classmethod new(mount_path, sub_path=None, beaker=None, host_path=None, weka=None, result=None, secret=None)[source]#
A convenience method for quickly creating a new
DataMount
.- Parameters:
mount_path (
str
) – Themount_path
.beaker (
Optional
[str
], default:None
) – Thebeaker
argument toDataSource
.host_path (
Optional
[str
], default:None
) – Thehost_path
argument toDataSource
.weka (
Optional
[str
], default:None
) – Theweka
argument toDataSource
.result (
Optional
[str
], default:None
) – Theresult
argument toDataSource
.url – The
url
argument toDataSource
.secret (
Optional
[str
], default:None
) – Thesecret
argument toDataSource
.
- Return type:
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
source:
- class beaker.DataSource(**data)[source]#
Bases:
BaseModel
Attention
Exactly one source field must be set.
-
beaker:
Optional
[str
]# The full name or ID of a Beaker dataset.
Tip
Beaker datasets provide the best download performance and are preferred for frequently used datasets.
-
host_path:
Optional
[str
]# Path to a file or directory on the host.
The executing host must be configured to allow access to this path or one of its parent directories. Currently the following host paths are allowed on every on-premise machine managed by the Beaker team:
/net
for access to NFS./raid
for access to RAID./var/beaker/share
as a shared local scratch space.
-
result:
Optional
[str
]# Name of a previous task whose result will be mounted.
Important
A result source implies a dependency, meaning this task will not run until its parent completes successfully.
-
secret:
Optional
[str
]# Name of a secret within the experiment’s workspace which will be mounted as a plain-text file.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
beaker:
- class beaker.TaskResources(**data)[source]#
Bases:
BaseModel
TaskResources describe minimum external hardware requirements which must be available for a task to run. Generally, only a GPU request is necessary.
-
cpu_count:
Optional
[float
]# Minimum number of logical CPU cores. It may be fractional.
Examples:
4
,0.5
.Tip
Since CPU is only limited during periods of contention, it’s generally not necessary to specify this field.
-
memory:
Optional
[str
]# Minimum available system memory as a number with unit suffix.
Examples:
2.5GiB
,1024m
.
Size of
/dev/shm
as a number with unit suffix. Defaults to5GiB
.Examples:
2.5GiB
,1024m
.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
cpu_count:
- class beaker.TaskContext(**data)[source]#
Bases:
BaseModel
Describes an execution environment, or how a task should be run.
Important
Because contexts depend on external configuration, a given context may be invalid or unavailable if a task is re-run at a future date.
-
cluster:
Optional
[str
]# The full name or ID of a Beaker cluster on which the task should run.
Attention
This field is deprecated. See
TaskSpec.constraints
instead.
-
priority:
Optional
[Priority
]# Set priority to change the urgency with which a task will run. Tasks with higher priority are placed ahead of tasks with lower priority in the queue.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
cluster:
- class beaker.Constraints(**data)[source]#
Bases:
BaseModel
Constraints are specified via the
constraints
field inTaskSpec
.This type also allows other fields that are not listed here.
-
cluster:
Optional
[List
[str
]]# A list of cluster names or IDs on which the task is allowed to be executed. You are allowed to omit this field for tasks that have preemptible priority, in which case the task will run on any cluster where you have permissions.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.
-
cluster:
- class beaker.ResultSpec(**data)[source]#
Bases:
BaseModel
Describes how to capture a task’s results.
Results are captured as datasets from the given location. Beaker monitors this location for changes and periodically uploads files as they change in near-real-time.
- model_computed_fields: ClassVar[dict[str, ComputedFieldInfo]] = {}#
A dictionary of computed field names and their corresponding ComputedFieldInfo objects.