Reputation: 389
I am aware that this can be done in R as follows
ds <- open_dataset("nyc-taxi/csv/2019", format = "csv",
partitioning = "month")
But is there a way to do in python ? Tried these but seems like thats not an option
from pyarrow import csv
table = csv.read_csv("*.csv")
from pyarrow import csv
path = os.getcwd()
table = csv.read_csv(path)
table
Is there a way to make it happen in python ?
Upvotes: 1
Views: 2101
Reputation: 139162
Yes, you can do this with pyarrow as well, similarly as in R, using the pyarrow.dataset
submodule (the pyarrow.csv
submodule only exposes functionality for dealing with single csv files).
Example code:
import pyarrow.dataset as ds
dataset = ds.dataset("nyc-taxi/csv/2019", format="csv", partitioning=["month"])
table = dataset.to_table()
And then in the to_table()
method you can specify row/column filters.
Upvotes: 5