mainrs
mainrs

Reputation: 53

How can I share a lazy dataframe between different runtimes?

I have a desktop application where the majority of calculations (>90%) happen on the Rust side of it. But I want the user to be able to write scripts in Python that will operate on the df.

Can this be done without serializing the dataframe between runtimes to a file?

A simple invocation could be this:

Rust: agg -> Rust: calculate new column -> Python: groupby -> Rust: count results

The serializing approach works for small datasets. It doesn't really scale to larger ones. The optimal solution would somehow be to be able to tell the python side: Here is a lazy dataframe in-memory. You can manipulate it.

I've read the documentation and the only solution I could see is to use Apache IPC.

Upvotes: 3

Views: 686

Answers (1)

Cory Grinstead
Cory Grinstead

Reputation: 661

The Lazyframes are (mostly) serializable to json. The serialize operations are fallible, so depending on the type of query, it may not serialize. The languages all have slightly different apis for this operation though.

This serializes the LogicalPlan itself & doesnt perform any computations on the dataframes. This makes the operation very inexpensive.

Python

lf = get_lf_somehow()
json_buf = lf.write_json(to_string=True)
lf = pl.LazyFrame.from_json(json)

Node.js

let lf = get_lf_somehow()
const json_buf = lf.serialize('json')
lf = pl.LazyFrame.deserialize(buf , 'json')

Rust

let lf = get_lf_somehow();
let json_buf = serde_json::to_vec(&lf)?;
let lf = serde_json::from_slice(&json_buf)?;

Upvotes: 3

Related Questions