Reputation: 10334
Is it in memory?
If so, then it doesn't matter if I import chunk by chunk or not because eventually, when I concatenate them, they'll all be stored in memory.
Does that mean for a large data set, there is no way to use pandas?
Upvotes: 2
Views: 2877
Reputation: 2020
Yes it is in memory, and yes when the dataset gets too large you have to use other tools. Of course you can load data in chucks, process one chunk at a time and write down the results (and so free memory for the next chunk). That works fine for some type of process like filtering and annotating while if you need sorting or grouping you need to use some other tool, personally I like bigquery from google cloud.
Upvotes: 1
Reputation: 52862
Yes, they will be stored in memory, and that's the reason why you want to chunk them - that allows you to not read the whole data set in at the same time, but process it in chunks before writing out the end result.
You can use chunksize
to tell pandas how many rows should be read for each chunk. If you need a complete set of rows to perform arbitrary lookups, you'll have to back it with some other technology (such as a database).
Upvotes: 5