NoName
NoName

Reputation: 10334

Where does pandas store the DataFrame while the program is running?

Is it in memory?

If so, then it doesn't matter if I import chunk by chunk or not because eventually, when I concatenate them, they'll all be stored in memory.

Does that mean for a large data set, there is no way to use pandas?

Upvotes: 2

Views: 2877

Answers (2)

Marco
Marco

Reputation: 2020

Yes it is in memory, and yes when the dataset gets too large you have to use other tools. Of course you can load data in chucks, process one chunk at a time and write down the results (and so free memory for the next chunk). That works fine for some type of process like filtering and annotating while if you need sorting or grouping you need to use some other tool, personally I like bigquery from google cloud.

Upvotes: 1

MatsLindh
MatsLindh

Reputation: 52862

Yes, they will be stored in memory, and that's the reason why you want to chunk them - that allows you to not read the whole data set in at the same time, but process it in chunks before writing out the end result.

You can use chunksize to tell pandas how many rows should be read for each chunk. If you need a complete set of rows to perform arbitrary lookups, you'll have to back it with some other technology (such as a database).

Upvotes: 5

Related Questions