RobinL
RobinL

Reputation: 11597

How to write a pandas dataframe to .arrow file

How can I write a pandas dataframe to disk in .arrow format? I'd like to be able to read the arrow file into Arquero as demonstrated here.

Upvotes: 5

Views: 10754

Answers (3)

ns15
ns15

Reputation: 8844

Pandas can directly write a DataFrame to the binary Feather format. (uses pyarrow)

import pandas as pd
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
df.to_feather('my_data.arrow')

Additional keywords are passed to pyarrow.feather.write_feather(). This includes the compression, compression_level, chunksize and version keywords.

Upvotes: 2

RobinL
RobinL

Reputation: 11597

You can do this as follows:

import pyarrow
import pandas

df = pandas.read_parquet('your_file.parquet')

schema = pyarrow.Schema.from_pandas(df, preserve_index=False)
table = pyarrow.Table.from_pandas(df, preserve_index=False)

sink = "myfile.arrow"

# Note new_file creates a RecordBatchFileWriter 
writer = pyarrow.ipc.new_file(sink, schema)
writer.write(table)
writer.close()

Upvotes: 5

Neal Richardson
Neal Richardson

Reputation: 792

Since Feather is the Arrow IPC format, you can probably just use write_feather. See http://arrow.apache.org/docs/python/feather.html

Upvotes: 7

Related Questions