DAG
DAGs (Directed Acyclic Graphs) are similar to Steps
, and in Hera, the syntax using context managers is exactly the same as Steps
.
DAGs are formed of Tasks
, and offer more flexibility in Workflow construction, with the key difference being that you
specify the dependencies of each Task
, i.e. which other Tasks
must run to completion before running this one.
Specifying Dependencies
The classic example for the DAG
is the “diamond”:
from hera.workflows import DAG, Workflow, script
@script()
def echo(message):
print(message)
with Workflow(generate_name="dag-diamond-", entrypoint="diamond") as w:
with DAG(name="diamond"):
A = echo(name="A", arguments={"message": "A"})
B = echo(name="B", arguments={"message": "B"})
C = echo(name="C", arguments={"message": "C"})
D = echo(name="D", arguments={"message": "D"})
A >> [B, C] >> D
Notice the >>
(rshift) syntax used with the returned objects from the echo
calls; it specifies the left-hand-side of
the operator is a dependency of the right-hand-side, i.e. A >> B
means “B depends on A”. This is syntactic sugar for
A.next(B)
, see the Task dependencies explained section for more examples.
The echo
calls are returning Task
objects as the function is being called under a DAG
context. Then, we can
specify dependencies between Tasks
and lists of Tasks
using the >>
syntax. A list acts as a boolean and
of the
elements, and it is important to note that a list cannot appear first in the chain or on both sides of a >>
.
Here, with A
first in the chain, it has no dependencies, so will run first. Then, with [B, C]
depending on A
, B
and C
will both run in parallel once A
has completed. Finally, D
depends on B
and C
, so will run once they
have both completed. We can use the rshift syntax anywhere with Tasks
, but it makes sense to keep it as the last
line of the DAG
context.
Parallel Steps Example as a DAG
If we look at the parallel steps example from the Steps walk through, we can write a Workflow
using a DAG
that behaves in the same way.
Looking at the with Steps
section:
with Steps(name="steps") as s:
echo(name="pre-parallel", arguments={"message": "Hello world!"})
with s.parallel():
echo(name="parallel-1", arguments={"message": "I'm parallel-1!"})
echo(name="parallel-2", arguments={"message": "I'm parallel-2!"})
echo(name="parallel-3", arguments={"message": "I'm parallel-3!"})
echo(name="post-parallel", arguments={"message": "Goodbye world!"})
First, remember to change your imports to get the DAG
class:
Then, going line-by-line, we can build a DAG starting by changing the context manager to a DAG
, and we can keep the
pre-parallel
call the same, as it will now create a Task
. However, we need to keep track of the created Task
, so
let’s assign it to a variable:
Next, DAG
s don’t have a parallel
function so we can remove that line and de-indent the parallel steps, and we can
leave post-parallel
as is. Remember we still need to assign variables!
with DAG(name="dag") as d:
pre = echo(name="pre-parallel", arguments={"message": "Hello world!"})
parallel_1 = echo(name="parallel-1", arguments={"message": "I'm parallel-1!"})
parallel_2 = echo(name="parallel-2", arguments={"message": "I'm parallel-2!"})
parallel_3 = echo(name="parallel-3", arguments={"message": "I'm parallel-3!"})
post = echo(name="post-parallel", arguments={"message": "Goodbye world!"})
Finally, we need to specify the dependencies, which will look very similar to the DAG diamond example.
with DAG(name="dag") as d:
pre = echo(name="pre-parallel", arguments={"message": "Hello world!"})
parallel_1 = echo(name="parallel-1", arguments={"message": "I'm parallel-1!"})
parallel_2 = echo(name="parallel-2", arguments={"message": "I'm parallel-2!"})
parallel_3 = echo(name="parallel-3", arguments={"message": "I'm parallel-3!"})
post = echo(name="post-parallel", arguments={"message": "Goodbye world!"})
pre >> [parallel_1, parallel_2, parallel_3] >> post
Task dependencies explained
Any Tasks
without a dependency defined will start immediately.
Dependencies between Tasks can be described using the convenience syntax >>
, which follows the default
depends logic of Argo, for example:
describes the relationships:
- “A has no dependencies (so starts immediately)
- “B depends on
A.Succeeded || A.Skipped || A.Daemoned
.
As a diagram:
A >> B
is equivalent to A.next(B)
.
Lists of Tasks
A list of Tasks used with the rshift syntax helps to describe multiple dependencies at once.
- A single Task on the left side of
>>
and a list Tasks on the right side is shorthand for specifying that each Task in the list independently depends on the single left-side Task and will all start once that Task has a task result ofSucceeded || Skipped || Daemoned
- A list of Tasks on the left of
>>
and a single Task on the right describes that the single Task will only run once all the Tasks in the list have the task result ofSucceeded || Skipped || Daemoned
- A list of Tasks on both sides of
>>
is not supported, and multiple dependency statements should be used
For example:
describes the relationships:
- “A has no dependencies
- “B depends on A; C depends on A”
- “D depends on B AND C”
As a diagram:
Dependencies can be described over multiple statements:
describes the relationships:
- “A and B have no dependencies
- “C depends on A AND B”
- “D depends on A AND B”
As a diagram: