andybega
andybega

Reputation: 1437

Combine multiple columns and rows into a Polars struct (dictionary)

I'm trying to convert a data frame into nested/hierarchical data that will be written out as JSON lines. The data are structured like this:

df = pl.DataFrame({
    "group_id": ["a", "a", "a", "b", "b", "b"],
    "label": ["dog", "cat", "mouse", "dog", "cat", "mouse"],
    "indicator": [1, 1, 0, 0, 0, 1]
})
df

┌──────────┬───────┬───────────┐
│ group_id ┆ label ┆ indicator │
│ ---      ┆ ---   ┆ ---       │
│ str      ┆ str   ┆ i64       │
╞══════════╪═══════╪═══════════╡
│ a        ┆ dog   ┆ 1         │
│ a        ┆ cat   ┆ 1         │
│ a        ┆ mouse ┆ 0         │
│ b        ┆ dog   ┆ 0         │
│ b        ┆ cat   ┆ 0         │
│ b        ┆ mouse ┆ 1         │
└──────────┴───────┴───────────┘

I'm trying to find a way to combine the "label" and "indicator" columns into a single dictionary (struct) per "group_id", where "label" are the keys and "indicator" the items. The result should look like this:

target = pl.DataFrame({
    "group_id": ["a", "b"],
    "label": [{"dog": 1, "cat": 1, "mouse": 0}, {"dog": 0, "cat": 0, "mouse": 1}],
})
target
┌──────────┬───────────┐
│ group_id ┆ label     │
│ ---      ┆ ---       │
│ str      ┆ struct[3] │
╞══════════╪═══════════╡
│ a        ┆ {1,1,0}   │
│ b        ┆ {0,0,1}   │
└──────────┴───────────┘

target["label"][0]
{'dog': 1, 'cat': 1, 'mouse': 0}

target.write_ndjson()

'{"group_id":"a","label":{"dog":1,"cat":1,"mouse":0}}\n{"group_id":"b","label":{"dog":0,"cat":0,"mouse":1}}\n'

Upvotes: 1

Views: 1059

Answers (1)

jqurious
jqurious

Reputation: 21580

Perhaps there is a simpler way, but it looks like a .pivot()

(df.pivot(index="group_id", columns="label", values="indicator", aggregate_function=None)
   .select("group_id", label=pl.struct(pl.exclude("group_id")))
#   .write_ndjson()
)   
shape: (2, 2)
┌──────────┬───────────┐
│ group_id ┆ label     │
│ ---      ┆ ---       │
│ str      ┆ struct[3] │
╞══════════╪═══════════╡
│ a        ┆ {1,1,0}   │
│ b        ┆ {0,0,1}   │
└──────────┴───────────┘
'{"group_id":"a","label":{"dog":1,"cat":1,"mouse":0}}\n{"group_id":"b","label":{"dog":0,"cat":0,"mouse":1}}\n'

Upvotes: 1

Related Questions