Workflow queue

Motivation

You might have a couple of workflows that you need to run after each other. This could be the case if one workflow is used for running analyses that produce data, and another is used to post-process the data. The workflows could also be long-running, so you wish to run them during the night to have the final post-prcessed results ready for review when you enter the office in the morning.

The workflow queue is useful in such situations. It allows you to put several workflows in a queue, and when you are ready, run them all in series, one by one. By default, the workflow queue is set up locally on your computer, but as we will get back to below, you can easily deploy a remote queue as well.

Your first queue

To prepare to queue our workflow from the previous examples, we export it to a JSON-file, e.g. pasta.json, and type the following:

caliber workflow queue pasta.json -p mypinion

That’s it, our workflow is now in a queue! Notice that after the -p flag, you specify the name of the Pinion that should run the workflow.

Under the hood of the queue

As soon as you type caliber workflow queue ..., the workflow object performs two tasks. First, the workflow is stored in a special Workload object. Think of the workload as a wrapper for the workflow, containing the workflow, the content of all attached files, and metadata like pinion name, timestamp, a flag to indicate whether the workflow run has completed, and the id of the object with the results returned from the workflow. To ensure that only you can run your workflows, or opposite, that you created the workflows that you attempt to run, the workload also calculates a run-secret that the pinion will try to match. Second, the workflow puts the workload in a WorkflowQueue. The WorkflowQueue contains a list of all the workloads that are put in the queue.

Examine the queue

After having established the queue, we can examine it by using the caliber queue ... command in the CLI. You can list the queue by typing caliber queue list, view one of the entries in the queue by typing caliber queue view <workload-id>, delete an entry by typing caliber queue reset <workload-id>, or reset the run-secret of an entry by typing caliber queue resetsecret <workload-id>.

Run the queue

Running workflows in a queue is handled by the Pinion. Think of the pinion as an active process on your computer that checks the queue for relevant work to perform. You specify the number of attempts, and how long to wait between each attempt.

To run the queue, type the following:

caliber pinion init mypinion

This command spins up a Pinion with the name mypinion. By default, it will attempt to find work 10 times with 5 seconds between each query. This can be specified using the -a and -w flags, respectively, e.g.:

Pinion that performs 100 attempts with 10 seconds between each attempt
caliber pinion init mypinion -a 100 -w 10

Under the hood of the pinion

When the pinion is initialized, it does a number of attempts of finding a workflow to run. For each attempt, it checks whether it can find a WorkflowQueue, and whether it can match the run secret in any of the Workloads in the queue. If the run secret is matched, the pinion creates the attached files, instatiates the workflow and calls the .run() method of the workflow. If the workflow is completed without errors, the returns from all the tasks are collected in a special Results object which is stored for later.

Access results

While running the workflow, the Pinion collects the returns from the functions in the tasks of the workflow, and stores them for later use. The returns can be accessed by using the get_results() function. With this functionality, you can separate your concerns, and for example create one workflow that produces data, and one that accesses and visualizes the data produced by the first.

Note that in order to store the results for later use, they need to be JSON-serialized. If any of the objects among the results cannot be serialized, a string representation of that object will be stored instead.

Best practice

By using the above functionality, we can extend our best practice.

For each workflow that you have prepared:

  1. Define the workflow in a Python script, and export the workflow definition to a JSON-file.

  2. Visualize the workflow by typing caliber workflow show ....

  3. Queue the workflow by typing caliber workflow queue ....

After having queued all the workflows that you have prepared:

  1. Initialize the Pinion and start running the queue by typing caliber pinion init ....

Alternatively, you may also have a separate terminal window open where you have a Pinion running and continuously listening for work. Either way, this structure lets you efficiently queue your work, or create a sharper separation between workflow definition, verification and execution.

Remote workflow queue

You can also deploy you queue as a remote workflow queue. This might be desirable if you define a workflow on you laptop which should have been run on a faster computer somewhere else. Caliber relies on Speckle to do the lifting. Before continuing, make sure you have Speckle Manager installed. Also, make sure that you have registered your Speckle accounts in Speckle Manager.

Speckle

Speckle is the open-source data platform that provides honest interoperability, real time collaboration, data management, versioning and automation. Caliber connects with your Speckle server and lets you exchange workflows and store results. Consider reviewing the core concepts of Speckle before continuing.

To set up a remote workflow queue, you need to specify a couple of details:

  1. A Speckle host, e.g. https://app.speckle.systems.

  2. A project to use for data transfer, e.g. 12345abcde.

  3. A model on the project, e.g. caliber. If you do not provide a model, main will be used as default, and if you provide a model that is not present on the project, Caliber will create it for you.

  4. A valid token to use for authentication. If you do not provide a token, Caliber will browse the accounts you have registered in Speckle Manager and use the token from the account that matches the Speckle host that you have provided. Note that Speckle Manager automatically creates a token for you to use, but you can also create your own.

You can set these details as environment variables. The project and model can also be overridden directly in the terminal in the caliber workflow queue ..., and caliber pinion init ... commands, and in the caliber queue ... commands.

When you run your workflow by sending it through Speckle, all the returns from your tasks will be collected and sent to the Speckle server when the workflow is completed. With your results safely stored on the Speckle server, you can view them in the Speckle web app, define a separate workflow that fetches the results and performs some post-processing, or even view your results in the 3D-viewer if your results contain some sort of geometry objects.

Remember to attach your input files if you wish to run the workflow remote. In our case, we could have attached pasta_functions.py to make our workflow self-contained. Note that currently, Speckle only supports attaching text-based files, and not binary files.

How to find the Speckle project id

The Speckle project id is a ten digit hash, e.g. 12345abcde, which identifies a project on a Speckle server. To find the project id, open a web browser, navigate to your Speckle server, e.g. app.speckle.systems/. Next, open a project by clicking Projects in the menu to the left, and selecting one of the projects that appear in the list. If no projects appear on your screen, you should be able to locate a big blue button for creating a new one. Having opened the project, the project id is shown in the address field in your browser, e.g. https://app.speckle.systems/projects/12345abcde.

Persist your data!

Use a remote workflow queue on the Speckle server to persist your data. This makes it easy to create separate workflows for analyses and post-processing, and you can even share your results with others that have access to your Speckle server 🚀