Oliver
Oliver

Reputation: 592

How to know how much RAM I need for data frame based on the size of CSV file?

I have a CSV file that has 10.8 GB. I need to read it and put it into a data frame. (pandas - Python) How do I know how much RAM I need?

My computer has 8 GB RAM installed, and it is not sufficient. However, I found Google Colab, which has almost 12.72 GB of RAM. Would it be sufficient?

Upvotes: 0

Views: 1234

Answers (1)

NateB
NateB

Reputation: 509

One way to estimate the size a CSV might need in RAM when read as DF, without actually having to manually compute the sized of every field:

Pandas provide this function: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.memory_usage.html

#   Given a DF d: (example DF is arbitrary, something I had quickly available)
>>> d.shape
(182442, 2)  (rows x cols)

>>> d.dtypes
sta     float64
elev    float64

>>> d.memory_usage()
Index        128
sta      1459536
elev     1459536
dtype: int64

This will give you info you can use to do some quick math

If you CSV is very large, you can create a small, representative sample of the CSV data and read this into a DF using, eg: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html , and then make use of the function results as above to be able to get an estimate of how much RAM would be called for if you were to read the entire file. Also, make sure you read in the CSV using the same option parameters for the read operation, as you will use for the real thing

Additional data on the DF is available. See this SO: get list of pandas dataframe columns based on data type

Armed with this information, you can plan an effective strategy for processing the DF using the chunks iterator options covered in the link immediately above.

Upvotes: 2

Related Questions