pandas.read_csv gives memory error despite comparatively small dimensions

Question

I am trying to load this CSV file into a pandas data frame using

import pandas as pd
filename = '2016-2018_wave-IV.csv'

df = pd.read_csv(filename)

However, despite my PC being not super slow (8GB RAM, 64 bit python) and the file being somewhat but not extraordinarily large (< 33 MB), loading the file takes more than 10 minutes. It is my understanding that this shouldn't take nearly that long and I would like to figure out what's behind this. (As suggested in similar questions, I have tried using chunksize and usecol parameters (EDIT and also low_memory), yet without success; so I believe this is not a duplicate but has more to do with the file or the setup.)

Could someone give me a pointer? Many thanks. :)

Hubert Dudek · Accepted Answer

I was testing the file which you shared and problem is that this csv file have leading and ending double quotes on every line (so Panda thinks that whole line is one column). It have to be removed before processing for example by using sed in linux or just process and re-save file in python or just replace all double quotes in text editor.

pandas.read_csv gives memory error despite comparatively small dimensions

Answers (2)

Related Questions