Cory Grinstead
Cory Grinstead

Reputation: 661

Polars lazyframe - add fields from other lazyframe as struct without a `collect`

I am trying to populate a new field containing a struct of all of the other fields from another lazyframe based on a predicate.

While the examples are in python, I am open to answers in python or rust.

companies = pl.DataFrame({
  "id": [1], 
  "name": ["google"], 
  "industry": [1001]
}).lazy()

industries = pl.DataFrame({
  "id": [1001],
  "name": ["tech"],
  "sectors": [[10011, 10012]]
}).lazy()

expected = pl.DataFrame({
  "id": [1],
  "name": ["polars"],
  "industry": [{
    "name": "tech",
    "sectors": [[10011, 10012]]
  }]
})

I can do this naively through join -> select -> to_struct -> lit, but that is not ideal as I have to perform a collect on the lazyframe to get my desired results.

right_columns = industries.schema.keys()

industry_matches = companies.join(
    industries,
    left_on="industry",
    right_on="id",
    how="inner"
).collect().select(right_columns).drop("id").to_struct("industry")

df = companies.with_columns(pl.lit(industry_matches))

df.collect()

Ideally, i'd like to do something like a join within an expression like so.

companies.select(
  pl.all(),
  pl.col("industry").join(industries, on="id").exclude("id"),
)

but I am open to any alternatives that do not require a collect

Upvotes: 4

Views: 1331

Answers (1)

user18559875
user18559875

Reputation:

Would this work for you:

(
    companies
    .join(
        industries
        .select(
            pl.col('id').alias('industry'),
            pl.struct(pl.exclude('id')).alias('industry_struct'),
        ),
        on="industry",
        how="inner"
    )
    .drop('industry')
    .collect()
)
shape: (1, 3)
┌─────┬────────┬─────────────────────────┐
│ id  ┆ name   ┆ industry_struct         │
│ --- ┆ ---    ┆ ---                     │
│ i64 ┆ str    ┆ struct[2]               │
╞═════╪════════╪═════════════════════════╡
│ 1   ┆ google ┆ {"tech",[10011, 10012]} │
└─────┴────────┴─────────────────────────┘

In essence, we are creating the struct within the join itself.

Upvotes: 2

Related Questions