How to get the Index of a DataFrame when using the chunksize argument?

Question

I have a very big .csv file which I cant load fully into my RAM. That's why I need to load my dataset witch the chunksize argument like this:

import pandas as pd
csv = pd.read_csv("challenger_match_V2.csv", chunksize=100, iterator=True)

But how do I access the dataset with the index.
Without using the chunksize argument I can just do dataframe[idx:idx].
How can I do that with chunksize

I tried doing:

for chunk in csv:
    print(chunk[idx])

which didn't work I got a KeyError with the index I tried to access the dataframe.

Example:

for chunk in csv:
    print(chunk[5])

Which gave the error:

   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 5

Lupos · Accepted Answer

I ended up throwing away some data from my dataframe to reduce the amount of memory needed.

How to get the Index of a DataFrame when using the chunksize argument?

Answers (2)

Related Questions