Reputation: 11597
How can I write a pandas dataframe to disk in .arrow
format? I'd like to be able to read the arrow file into Arquero as demonstrated here.
Upvotes: 5
Views: 10754
Reputation: 8844
Pandas can directly write a DataFrame to the binary Feather format. (uses pyarrow)
import pandas as pd
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
df.to_feather('my_data.arrow')
Additional keywords are passed to pyarrow.feather.write_feather(). This includes the compression, compression_level, chunksize and version keywords.
Upvotes: 2
Reputation: 11597
You can do this as follows:
import pyarrow
import pandas
df = pandas.read_parquet('your_file.parquet')
schema = pyarrow.Schema.from_pandas(df, preserve_index=False)
table = pyarrow.Table.from_pandas(df, preserve_index=False)
sink = "myfile.arrow"
# Note new_file creates a RecordBatchFileWriter
writer = pyarrow.ipc.new_file(sink, schema)
writer.write(table)
writer.close()
Upvotes: 5
Reputation: 792
Since Feather is the Arrow IPC format, you can probably just use write_feather
. See http://arrow.apache.org/docs/python/feather.html
Upvotes: 7