WDL assertion statements

This lab will walk through prototyping a significant extension to the WDL language: an assert statement to be checked at runtime. For example,

version development
workflow div {
    input {
        Int numerator
        Int denominator
    }
    assert denominator != 0
    output {
        Int quotient = numerator / denominator
    }
}

We’ll cover how to add the new construct to the language grammar and syntax tree, implementing the runtime assertion checking for tasks and workflows, and adding appropriate cases to the unit test suite. This builds on the basic miniwdl development process introduced in the previous codelab on adding library functions.

Refer to the complete git diff for this example on the codelab-assert branch.

Syntax tree model

Among existing language constructs, the new assertion statement is most similar to a value declaration (WDL.Decl), insofar as it entails evaluation of a single WDL expression after any dependencies it refers to. Imagine the statement assert :expr: translating to the declaration Boolean some_unique_name = :expr:, with the special behavior that the runner should raise an error if it evaluates to false.

We’ll introduce the new statement to the syntax tree model by subclassing WDL.Decl in WDL/Tree.py,

class Assertion(Decl):
    message: str

    def __init__(self, pos: SourcePosition, expr: Expr.Base) -> None:
        super().__init__(pos, Type.Boolean(), f"_assert_L{pos.line}C{pos.column}", expr)
        self.message = f"assertion failed: {str(expr)} ({pos.uri} Ln {pos.line} Col {pos.column})"

The new subclass initializes its WDL.Decl base with the boolean type, an arbitrary unique name, and the WDL expression to be evaluated. Actual WDL declaration names can’t start with an underscore, so we can be sure this name won’t cause collisions.

A major advantage of this approach is that the workflow runner “sees” the new construct as a declaration, inheriting all the existing logic for analyzing the expression’s dependencies and scheduling its evaluation. It also inherits the static type-checking logic to verify that the expression can only evaluate to a boolean value.

Lastly, our constructor pre-computes the assertion failure message for later use.

Grammar and parser

Miniwdl uses Lark for basic lexing/parsing, entailing (i) an EBNF grammar hard-coded as a lengthy string in WDL/_grammar.py, and (ii) “transformers” to map Lark’s low-level syntax tree onto the WDL.Tree object model, found in WDL/_parser.py. The former source file furthermore includes several versions of the grammar for different WDL language versions, and the latter the mechanism for choosing the appropriate grammar based on the version declaration in the source code. We’ll introduce our new assertion statement to the “development” version of the grammar.

In WDL/_grammar.py we add the production,

...
versions[
    "development"
] = r"""
...
assertion: "assert" expr
...

Reusing the existing expr production. Then we can let the statement appear in the body of a task, alongside any of its “non-input” declarations.

task: "task" CNAME "{" task_section* command task_section* "}"
?task_section: input_decls
             | output_decls
             | meta_section
             | runtime_section
             | any_decl -> noninput_decl
             | assertion -> noninput_decl

(We also add assert to the list of language keywords that can’t be used as names.)

Next, in WDL/_parser.py we add to the _DocTransformer class,

...
class _DocTransformer:
    ...

    def assertion(self, items, meta):
        return Tree.Assertion(self._sp(meta), items[0])

Lark calls this method when the new assert statement appears in the source code, upon which we create an instance of our new Assertion node based on the source position and expression.

Even though we haven’t yet added the logic to actually check the asserted condition at runtime, we can begin writing our test cases to check that the new statement is recognized syntactically. In tests/test_7runner.py we add a new test case class deriving from the RunnerTestCase(unittest.TestCase) helper,

...
class TestAssert(RunnerTestCase):
    task1 = R"""
    version development
    task div {
        input {
            Int numerator
            Int denominator
        }
        assert denominator != 0
        command {
            expr ~{numerator} / ~{denominator}
        }
        output {
            Int quotient = read_int(stdout())
        }
    }
    """

    def test_positive(self):
        outputs = self._run(self.task1, {"numerator": 7, "denominator": 2})
        self.assertEqual(outputs["quotient"], 3)

The self._run() helper method runs the given WDL on the inputs and returns the outputs. Run the new test with python3 -m unittest tests.test_7runner.TestAssert; the runner evaluates the assertion as if it were a declaration, but doesn’t yet care about the outcome.

Checking assertions in tasks

The task runtime logic is found in WDL/runtime/task.py. We’ll add the assertion checking logic to the _eval_task_inputs subroutine, which is responsible for evaluating all the task’s input and “post-input” declarations.

def _eval_task_inputs(...):
    ...
    for decl in decls_to_eval:
        ...
            v: Value.Base = decl.expr.eval(container_env, stdlib=stdlib).coerce(decl.type)
        ...
        container_env = container_env.bind(decl.name, v)
        if isinstance(decl, Tree.Assertion) and not v.value:
            raise Error.RuntimeError(decl.message)

By this point the subroutine has already sorted the declarations in order of their dependencies, and the excerpted loop proceeds to evaluate each one and bind the value in the environment. We simply add a check to see if decl is actually an instance of our Assertion subclass, and if so, raise an error if the value is false.

Now we can add to our TestAssert class a new case checking that the assertion causes the task to fail:

class TestAssert(RunnerTestCase):
    ...
    def test_negative(self):
        self._run(self.task1, {"numerator": 7, "denominator": 0}, expected_exception=WDL.Error.RuntimeError)

Assertions in workflows

Finally we can enable exceptions in workflows as well as tasks. Add our assertion production to the grammar for workflow bodies (and scatter/if section bodies):

workflow: "workflow" CNAME "{" workflow_element* "}"
?workflow_element: input_decls | any_decl | call | scatter | conditional | workflow_outputs | meta_section | assertion

scatter: "scatter" "(" CNAME "in" expr ")" "{" inner_workflow_element* "}"
conditional: "if" "(" expr ")" "{" inner_workflow_element* "}"
?inner_workflow_element: any_decl | call | scatter | conditional | assertion

The workflow runner in WDL/runtime/workflow.py would now treat assertion statements as declarations and schedule their evaluation appropriately. We just need to add a check on the value in the corresponding logic,

class StateMachine:
    ...
    def _do_job(self, job):
        ...
        if isinstance(job.node, Tree.Decl):
            ...
                v = job.node.expr.eval(env, stdlib=stdlib).coerce(job.node.type)
            ...
            if isinstance(job.node, Tree.Assertion) and not v.value:
                raise Error.RuntimeError(job.node.message)
            return Env.Bindings(Env.Binding(job.node.name, v))

And add tests,

class TestAssert(RunnerTestCase):
    ...
    wf1 = R"""
    version development
    workflow div {
        input {
            Int numerator
            Int denominator
        }
        assert denominator != 0
        output {
            Int quotient = numerator / denominator
        }
    }
    """

    def test_workflow(self):
        outputs = self._run(self.wf1, {"numerator": 7, "denominator": 2})
        self.assertEqual(outputs["quotient"], 3)
        self._run(self.wf1, {"numerator": 7, "denominator": 0}, expected_exception=WDL.Error.RuntimeError)

Next steps

Refer to the complete git diff for this example on the codelab-assert branch.

To further polish our prototype, we might:

  • Add a WDL.runtime.AssertionFailed exception for increased specificity compared to RuntimeError
  • Test that the type-checker accepts only boolean expressions
  • Test that assertions referring to intermediate values and/or call outputs work as expected
  • Allow assertions referring to task and workflow outputs in the respective sections
  • Accept an optional custom error message, e.g. assert denominator != 0, "division by zero", and propagate it through the raised exception