Xion
Xion

Reputation: 389

Can I load multiple csv files using pyarrow?

I am aware that this can be done in R as follows

ds <- open_dataset("nyc-taxi/csv/2019", format = "csv",
  partitioning = "month")

But is there a way to do in python ? Tried these but seems like thats not an option

from pyarrow import csv
table = csv.read_csv("*.csv")
from pyarrow import csv
path = os.getcwd()
table = csv.read_csv(path)
table

Is there a way to make it happen in python ?

Upvotes: 1

Views: 2101

Answers (1)

joris
joris

Reputation: 139162

Yes, you can do this with pyarrow as well, similarly as in R, using the pyarrow.dataset submodule (the pyarrow.csv submodule only exposes functionality for dealing with single csv files).

Example code:

import pyarrow.dataset as ds

dataset = ds.dataset("nyc-taxi/csv/2019", format="csv", partitioning=["month"])
table = dataset.to_table()

And then in the to_table() method you can specify row/column filters.

Upvotes: 5

Related Questions