Preserve index when loading pyarrow parquet from pandas DataFrame

Question

I need to convert a dict with dict values to parquet, I have data that look like this:

{"KEY":{"2018-12-06":250.0,"2018-12-07":234.0}}

I'm converting to pandas dataframe and then writing to pyarrow table:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

data = {"KEY":{"2018-12-06":250.0,"2018-12-07":234.0}}
df = pd.DataFrame.from_dict(data, orient='index')
table = pa.Table.from_pandas(df, preserve_index=False)
pq.write_table(table, 'file.parquet', flavor='spark')

I end up with data, that only have dates and values, but without the key of the dict.:

{"2018-12-06":250.0,"2018-12-07":234.0}

What I need is to also have the key of the data:

{"KEY": {"2018-12-06":250.0,"2018-12-07":234.0}}

cs95 · Accepted Answer

If you wanted to preserve the index, then you should've specified as such; set preserve_index=True:

table = pa.Table.from_pandas(df, preserve_index=True)

pq.write_table(table, 'file.parquet', flavor='spark')
pq.read_table('file.parquet').to_pandas()  # Index is preserved.

     2018-12-06  2018-12-07
KEY       250.0       234.0

Preserve index when loading pyarrow parquet from pandas DataFrame

Answers (2)

Related Questions