Reputation: 1092
I have a really large csv file about 10GB. When ever I try to read in into iPython notebook using
data = pd.read_csv("data.csv")
my laptop gets stuck. Is it possible to just read like 10,000 rows or 500 MB of a csv file.
Upvotes: 5
Views: 17527
Reputation: 29690
It is possible. You can create an iterator yielding chunks of your csv of a certain size at a time as a DataFrame by passing iterator=True
with your desired chunksize
to read_csv
.
df_iter = pd.read_csv('data.csv', chunksize=10000, iterator=True)
for iter_num, chunk in enumerate(df_iter, 1):
print(f'Processing iteration {iter_num}')
# do things with chunk
Or more briefly
for chunk in pd.read_csv('data.csv', chunksize=10000):
# do things with chunk
Alternatively if there was just a specific part of the csv you wanted to read, you could use the skiprows
and nrows
options to start at a particular line and subsequently read n
rows, as the naming suggests.
Upvotes: 14
Reputation: 496
Likely a memory issue. On read_csv you can set chunksize (where you can specify number of rows).
Alternatively, if you don't need all the columns, you can change usecols on read_csv to import only the columns you need.
Upvotes: 0