Reputation: 23
The data file is so big that I want to receive it at certain intervals only to reduce the interpretation time. I'm using pandas.read_csv. How can I get only one line for every n lines?
Upvotes: 2
Views: 823
Reputation: 176
Not the every nth row but if the dataset is huge you can try to read and process it in chunks like that:
df_chunks = pd.read_csv("train/train.csv", chunksize=5000)
In this case it will return not the whole data frame but an iterator, each contains some portion of the csv file with 5000 rows.
Upvotes: 1
Reputation: 4761
Try ignoring rows by their indices:
n = 5
skip_func = lambda x: x%n != 0
df = pd.read_csv("data.csv", skiprows = skip_func)
When skiprows
is a callable, pandas.read_csv
ignore those rows whose indices return True
when they are evaluated in the function.
Upvotes: 5