Xion
Xion

Reputation: 389

How to specify which columns to load in pyarrow.dataset

I am trying to get only the columns what I want, like how we do in pandas.

use_cols = ["ArrDelay", "DepDelay"]
df = pd.read_csv(path, usecols=use_cols)
df

Is there an option similar to that in arrow ?

dataset = ds.dataset(path, format="csv")

Upvotes: 0

Views: 1199

Answers (1)

Pace
Pace

Reputation: 43817

I'm guessing what you want is...

table = dataset.to_table(columns=["ArrDelay", "DepDelay"])

The dataset methods scan(), to_batches(), and to_tables() all take the same arguments, which are documented on the scan() method.

Upvotes: 4

Related Questions