Can I load multiple csv files using pyarrow?

Question

I am aware that this can be done in R as follows

ds <- open_dataset("nyc-taxi/csv/2019", format = "csv",
  partitioning = "month")

But is there a way to do in python ? Tried these but seems like thats not an option

from pyarrow import csv
table = csv.read_csv("*.csv")

from pyarrow import csv
path = os.getcwd()
table = csv.read_csv(path)
table

Is there a way to make it happen in python ?

joris · Accepted Answer

Yes, you can do this with pyarrow as well, similarly as in R, using the pyarrow.dataset submodule (the pyarrow.csv submodule only exposes functionality for dealing with single csv files).

Example code:

import pyarrow.dataset as ds

dataset = ds.dataset("nyc-taxi/csv/2019", format="csv", partitioning=["month"])
table = dataset.to_table()

And then in the to_table() method you can specify row/column filters.

Can I load multiple csv files using pyarrow?

Answers (1)

Related Questions