Reputation: 2463
I have a binary file containing records from a C struct. I would like to read that file into a Polars Dataframe.
I can accomplish that as below, but I'm wondering if there is a more direct path?
My current solution involves:
np.fromfile()
# Data read in from file using np.fromfile()
data = np.array([(1, 2002, 2, 13, 0.3),
(2, 2005, 1, -10, 1.5),
(3, 2004, 2, 54, -0.12)],
dtype=[("id", "<i4"),("yr", "<u2"),("sex", "<u2"),("val1", "<i2"),("val2", "<f4")]
)
df = pl.from_pandas(pd.DataFrame(data))
df
id yr sex val1 val2
i32 u16 u16 i16 f32
1 2002 2 13 0.3
2 2005 1 -10 1.5
3 2004 2 54 -0.12
I've tried reading data
directly into Polars from numpy using pl.DataFrame(data)
or pl.from_records(data)
, but in both cases I get a single column dataframe of type "object", which I can't work out how to separate into separate columns or convert to a struct.
Upvotes: 1
Views: 983
Reputation: 476
data = np.array([(1, 2002, 2, 13, 0.3),
(2, 2005, 1, -10, 1.5),
(3, 2004, 2, 54, -0.12)],
dtype=[("id", "<i4"),("yr", "<u2"),("sex", "<u2"),("val1", "<i2"),("val2", "<f4")]
)
pl.DataFrame(
{
field_name: data[field_name]
for field_name in data.dtype.fields
}
)
┌─────┬──────┬─────┬──────┬───────┐
│ id ┆ yr ┆ sex ┆ val1 ┆ val2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i32 ┆ u16 ┆ u16 ┆ i16 ┆ f32 │
╞═════╪══════╪═════╪══════╪═══════╡
│ 1 ┆ 2002 ┆ 2 ┆ 13 ┆ 0.3 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 2005 ┆ 1 ┆ -10 ┆ 1.5 │
├╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3 ┆ 2004 ┆ 2 ┆ 54 ┆ -0.12 │
└─────┴──────┴─────┴──────┴───────┘
To convert back to a numpy struct array, assign a numpy array per field:
# Create numpy struct array of the correct size.
numpy_struct_array = np.empty(df.height, data.dtype)
# Fill in the correct values.
for field, col in zip(data.dtype.fields, df.columns):
numpy_struct_array[field] = df.get_column(col).to_numpy()
numpy_struct_array
array([(1, 2002, 2, 13, 0.3 ), (2, 2005, 1, -10, 1.5 ),
(3, 2004, 2, 54, -0.12)],
dtype=[('id', '<i4'), ('yr', '<u2'), ('sex', '<u2'), ('val1', '<i2'), ('val2', '<f4')])
Upvotes: 3