How to speed up pandas dataframe creation from huge file?

Question

I have a file bigger than 7GB. I am trying to place it into a dataframe using pandas, like this:

df = pd.read_csv('data.csv')

But it takes too long. Is there a better way to speed up the dataframe creation? I was considering changing the parameter engine='c', since it says in the documentation:

"engine{‘c’, ‘python’}, optional
Parser engine to use. The C engine is faster while the python engine is currently more feature-complete."

But I dont see much gain in speed

Ignacio Alorre · Accepted Answer

If the problem is you are not able to create the dataframe since the big size makes the operation to fail, you can check how to chunk it in this answer

In case it is created at some point, but you consider it is too slow, then you can use datatable to read the file, then convert to pandas, and continue with your operations:

import pandas as pd 
import datatable as dt

# Read with databale
datatable_df = dt.fread('myfile.csv')

# Then convert the dataframe into pandas
pandas_df = frame_datatable.to_pandas()

How to speed up pandas dataframe creation from huge file?

Answers (1)

Related Questions