Pandas: read_csv reading large csv file with no NaNs

Question

I have a large dataset in .csv file format, with around 60 GB of data containing more than 60% of the data is missing in some columns and rows, Since Its not possible to read such a huge file directly into jupyter notebook, I want to read only specific columns and only non-null rows into jupyter notebook using pandas.read_csv. How can this be done?

Thanks in advance!!

Naga kiran · Accepted Answer

You can read the CSV file chunk by chunk and retain the rows which you want to have

iter_csv = pd.read_csv('sample.csv',, usecols = ['col1','col2'] iterator=True, chunksize=10000,error_bad_lines=False)
data = pd.concat ([chunk.dropna(how='all') for chunk in iter_csv] )

Pandas: read_csv reading large csv file with no NaNs

Answers (2)

Related Questions