Working with modules¶
dds
works by tracking changes in the source code. Because a lot of the code are irrelevant to be tracked (for example, the low-level python modules to open files), dds
makes the decision that the user needs to indicate which code should be tracked. This whitelisting approach allows users to track only pieces of code that matter. For example, all the logging system may not be very relevant for the outcome of the business logic.
This fact is not usually important in simple cases. When working with a single notebook, or when working with a single python file, then there is nothing to do: all the functions written in a standalone python script or notebook are automatically considered. How to work with more complicated code bases, which include modules and other packages with important business logic to track? This is the topic of this tutorial.
Consider the following python module my_module
, which has one data function:
! cat my_module.py
import dds @dds.data_function("/my_function") def my_function(): return "my_function"
Just trying to run this function is going to give an error:
import my_module
# This statement will fail!
try:
my_module.my_function()
except dds.DDSException as e:
print(str(e)[:300])
Cannot load path <my_module/my_function>: this object cannot be retrieved, however the module 'my_module' exists. The typical cause of the issue is that the module my_module has not been whitelisted for use by DDS. Use the function 'dds.accept_module' to whitelist my_module or one of its submodules.
How do deal with this? The error message includes a hint to the usage of the accept_module
function. This function instructs dds
to consider a specific module or package for inclusion when inspecting the code. In particular, all the data functions must be whitelisted.
Here is our previous example, fixed:
import dds
dds.accept_module("my_module")
my_module.my_function()
'my_function'
All sub-modules are automatically accepted. For example in our case, my_module.sub_module
is now also accepted.
Dependencies to other modules are not automatically accepted, and should be also added if they are important. Consider this example with two modules: one module containing the business logic, and one module containing some utilities that we do not want to track:
! cat my_module_utils.py
variable = 1 def get_util_variable(): return variable
! cat my_module_important.py
import dds from my_module_utils import get_util_variable @dds.data_function("/f") def f(): print("executing f") return get_util_variable() * 2
dds.accept_module("my_module_important")
import my_module_important
my_module_important.f()
executing f
2
my_module_utils
is not whitelisted. If we modify it, it is not going to retrigger a calculation
import my_module_utils
my_module_utils.variable = 2
my_module_important.f()
2
We can decide later to include my_module_utils
as well, by calling accept_module('my_module_utils')
.
This is a tradeoff for the user:
- add more code to track more changes, which could be irrelevant to the result
- track less code and focus on the core business logic, at the expense of missing some important changes
As a recommendation, accept the packages that you are working on, and possibly some important dependencies that contain data functions themselves.
To conclude this tutorial:
dds
can track code in module using theaccept_module
function- all the data functions must be in whitelisted modules or sub-modules
- dependencies are not automatically tracked