Global data object¶
Think of the case where your workflow contains tasks with the following functionality:
Read indata and perform intermediate calculations.
Perform main calculation based on results from previous step.
Report results.
How would you make the results from step 1 available for step 2? One solution could be to write the results to a file in step 1, and later use this file as input in step 2. This is a good solution if step 2 is a program call that actually relies on reading input files, but for a more data-driven workflow with pure Python functions, this is not optimal.
A better solution would be to have a common space that different functions can access for storing and reading data as pure Python objects. This is where the global data object g comes into play. g is part of the context that is set up when caliber is imported in your Python scripts or modules.
Import it in your modules
If you have a data-driven workflow based on functions collected in a separate module: use g to access data across the different functions. Import caliber in your module and access g from the context caliber.context.g or import the context directly from caliber import context and access it from the context as context.g.
Under the hood of the global data object
The global data object is a namespace object, i.e. a container for symbolic names along with objects that the names refer to. Think of it as a custom dictionary. In fact, g implements common dict methods like .get(), .pop(), .setdefault(), and .clear(), e.g. context.g.get('answer', 42). You can even represent it as a dict by dict(context.g), or check if an attribute is set by answer in context.g.
In the example below, functions.py defines three functions representing the functionality described above. g is available to the functions from context after the import on line 1. Notice that only sum_list() takes arguments, and that the other functions operate on data in g. The workflow is defined in main.py, and is run by typing caliber workflow run main.json after creating main.json with py main.py.
Try experimenting with setting different values on line 9 in main.py to see the workflow operate on another set of input. Also, try commenting out line 6 in functions.py to trigger the else branch in print_results().
1from caliber import context, print
2
3
4def sum_list(*args):
5 """Sum numbers in a list."""
6 context.g.input = args
7 context.g.sum = sum(args)
8
9
10def square():
11 """Calculate the square of a number."""
12 context.g.square = context.g.sum**2
13
14
15def print_result():
16 """Prints the input and output of the workflow."""
17 # Get dict representation of g
18 g_dct = dict(context.g)
19
20 # Print input
21 if 'input' in context.g:
22 print('The input:')
23 print(' '.join(f'{number:.2f}' for number in g_dct['input']))
24 else:
25 print('No input stored in g')
26
27 # Print output
28 print(f'The sum:\n{g_dct["sum"]:.2f}')
29 print(f'The square:\n{g_dct["square"]:.2f}')
1import caliber
2import functions
3
4# Define tasks
5sums = caliber.Task(
6 function=functions.sum_list,
7 name='Calculate sum',
8 args=[
9 1.0, 4.2, 3.14, 6.3
10 ],
11)
12
13squares = caliber.Task(
14 function=functions.square,
15 name='Calculate square of sum',
16)
17
18prints = caliber.Task(
19 function=functions.print_result,
20 name='Print input and output',
21)
22
23# Collect tasks in process
24calculation = caliber.Process(
25 name='Calculation',
26 tasks=[
27 sums,
28 squares,
29 prints,
30 ],
31)
32
33# Create workflow
34do_calculation = caliber.Workflow(
35 process=calculation,
36 name='Calculate square of sum of list',
37)
38
39do_calculation.to_json('main.json')
Potential footgun
When one function does changes on g as part of it’s execution, these changes will be available to other functions. A common use case is where one function creates input to another, e.g. a list with data. Other functions might need to operate on this list, e.g. adding or subtracting values, or even appending or popping items. Lists or dicts that are accessed from g should therefore be turned into instances that are local to the scope of the function through list or dict comprehension, to avoid introducing hard-to-detect bugs by accidentally editing data that other functions rely on.