- Published on
Automatically transform complex python methods to polars expressions
Update 28.09.2023: My colleague Bela held a Lightning Talk at the Big PyData BBQ. You can check it out on YouTube!
In this post, we introduce polarIFy, a Python function decorator that
gives you a simpler way to write logical statements for Polars. With polarIFy, you
can use Python’s language structures like if / elif / else statements and transform
them into pl.when(..).then(..).otherwise(..) statements. This makes your code
more readable and less cumbersome to write.
At QuantCo, we frequently deal with insurance contracts where the calculation behavior is contingent upon the exact tariff. For instance, we might calculate certain properties differently for older contracts compared to newer ones.
This introduces a degree of conditional business logic into our data processing pipelines.
When working with Polars, these conditional operations often translate into nested
pl.when(..).then(..).otherwise(..) statements. While this structure is powerful
and allows for a great deal of flexibility, it can quickly become complex and difficult
to read, especially when dealing with multiple conditions or branches.
This is where polarIFy comes in. By allowing us to write these conditional statements
in a more Pythonic way using if / elif / else structures, polarIFy greatly simplifies
our code. It makes our Polars pipelines cleaner, more readable, and less prone to
errors, especially when dealing with the intricate conditional logic often required
in our work with insurance contracts.
Using polarIFy
polarIFy can automatically transform Python functions using if / elif / else statements
into Polars expressions. Here’s an example:
@polarify
def func(x: pl.Expr) -> pl.Expr:
s = 1
if x > 10:
return s + 10
else:
t = 2
if x > 0:
return t
else:
return -s
This gets transformed into:
def func_polarified(x: pl.Expr) -> pl.Expr:
return (
pl.when(x > 10)
.then(1 + 10)
.otherwise(
pl.when(x > 0)
.then(2)
.otherwise(-1)
)
)
polarIFy can also handle multiple statements and nested statements, making it a versatile tool for simplifying your Polars code.
How it works
Let’s take a look at the above example in a control flow graph:
In order to transform this control flow into a Polars expression, we need to keep
track of all possible assignments to variables to determine when to return what.
We do that by creating a dictionary assignments for each node that maps variable
names to their values.
At the beginning, we haven’t assigned any values to any variables, so we start with
an empty dictionary: assignments = {}.
In the first step, we assign the value 1 to the variable s, so we update the
dictionary: assignments = {'s': 1}.
Going further, we evaluate if x > 10. This translates to the polars expression
pl.when(x > 10).then(..).otherwise(..).
We need to keep track recursively of all possible assignments to variables and
the return value of the then and else branches of our control flow.
Since the then branch directly returns s + 10, we can just put a s + 10 in
our polars expression:
pl.when(x > 10).then(assignments['s'] + 10).otherwise(..).
Now, let’s look at the else branch.
We assign the value 2 to the variable t, so we create a new dictionary from
the old one with our new assignment: assignments = {'s': 1, 't': 2}.
We now have another if statement which gets translated into another
pl.when(x > 0).then(..).otherwise(..) expression.
The then branch returns the value of t, which we can get from our
assignments dictionary. Thus, we can put assignments['t'] into our polars
expression: pl.when(x > 0).then(assignments['t']).otherwise(..).
The else branch returns -s, which we can also get from our assignments
dictionary: pl.when(x > 0).then(..).otherwise(-assignments['s']).
All in all, we land at the following polars expression:
(
pl.when(x > 10)
.then(1 + 10)
.otherwise(
pl.when(x > 0)
.then(2)
.otherwise(-1)
)
)
Limitations
Since polarIFy transforms Python functions into Polars expressions, it can only
handle logic that Polars itself can also do.
For instance, it can’t handle for and while loops, since it is not possible
to do loops for a single element with Polars expressions.
Also, functions with side effects (like print, raise or pl.write_csv) are
not supported since these don’t make sense in a Polars expression context.
Since polarIFy is a relatively new project, it might not work for all use cases yet. As of August 2023, there are some known limitations:
match ... casestatements are not supported- the walrus operator
:=is not supported
Conclusion
polarIFy is a powerful tool that simplifies the process of writing conditional statements in Polars. By allowing you to write conditions in a more Pythonic way, it makes your code cleaner and easier to understand. We’re excited about its potential to make working with Polars even more efficient and enjoyable.
We’re always looking for feedback and contributions, so feel free to check out the polarIFy GitHub repository and let us know what you think!
This is a cross-post from the QuantCo blog. Check out the other posts there!