GKJ
GKJ

Reputation: 23

Python Pandas: How do I get data one for each nth line with CSV files in?

The data file is so big that I want to receive it at certain intervals only to reduce the interpretation time. I'm using pandas.read_csv. How can I get only one line for every n lines?

Upvotes: 2

Views: 823

Answers (2)

Caner Burc BASKAYA
Caner Burc BASKAYA

Reputation: 176

Not the every nth row but if the dataset is huge you can try to read and process it in chunks like that:

df_chunks = pd.read_csv("train/train.csv", chunksize=5000) 

In this case it will return not the whole data frame but an iterator, each contains some portion of the csv file with 5000 rows.

Upvotes: 1

Pablo C
Pablo C

Reputation: 4761

Try ignoring rows by their indices:

n = 5
skip_func = lambda x: x%n != 0
df = pd.read_csv("data.csv", skiprows = skip_func)

When skiprows is a callable, pandas.read_csv ignore those rows whose indices return True when they are evaluated in the function.

Upvotes: 5

Related Questions