Reputation: 592
I have a CSV file that has 10.8 GB. I need to read it and put it into a data frame. (pandas - Python) How do I know how much RAM I need?
My computer has 8 GB RAM installed, and it is not sufficient. However, I found Google Colab, which has almost 12.72 GB of RAM. Would it be sufficient?
Upvotes: 0
Views: 1234
Reputation: 509
One way to estimate the size a CSV might need in RAM when read as DF, without actually having to manually compute the sized of every field:
Pandas provide this function: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.memory_usage.html
# Given a DF d: (example DF is arbitrary, something I had quickly available)
>>> d.shape
(182442, 2) (rows x cols)
>>> d.dtypes
sta float64
elev float64
>>> d.memory_usage()
Index 128
sta 1459536
elev 1459536
dtype: int64
This will give you info you can use to do some quick math
If you CSV is very large, you can create a small, representative sample of the CSV data and read this into a DF using, eg: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html , and then make use of the function results as above to be able to get an estimate of how much RAM would be called for if you were to read the entire file. Also, make sure you read in the CSV using the same option parameters for the read operation, as you will use for the real thing
Additional data on the DF is available. See this SO: get list of pandas dataframe columns based on data type
Armed with this information, you can plan an effective strategy for processing the DF using the chunks
iterator options covered in the link immediately above.
Upvotes: 2