Reputation: 53
I have a desktop application where the majority of calculations (>90%) happen on the Rust side of it. But I want the user to be able to write scripts in Python that will operate on the df.
Can this be done without serializing the dataframe between runtimes to a file?
A simple invocation could be this:
Rust: agg -> Rust: calculate new column -> Python: groupby -> Rust: count results
The serializing approach works for small datasets. It doesn't really scale to larger ones. The optimal solution would somehow be to be able to tell the python side: Here is a lazy dataframe in-memory. You can manipulate it
.
I've read the documentation and the only solution I could see is to use Apache IPC.
Upvotes: 3
Views: 686
Reputation: 661
The Lazyframes are (mostly) serializable to json. The serialize operations are fallible, so depending on the type of query, it may not serialize. The languages all have slightly different apis for this operation though.
This serializes the LogicalPlan
itself & doesnt perform any computations on the dataframes. This makes the operation very inexpensive.
lf = get_lf_somehow()
json_buf = lf.write_json(to_string=True)
lf = pl.LazyFrame.from_json(json)
let lf = get_lf_somehow()
const json_buf = lf.serialize('json')
lf = pl.LazyFrame.deserialize(buf , 'json')
let lf = get_lf_somehow();
let json_buf = serde_json::to_vec(&lf)?;
let lf = serde_json::from_slice(&json_buf)?;
Upvotes: 3