Reputation: 83
I am trying to import a .csv
file from my Downloads folder.
Usually, the read_csv
function will import the entire rows, though there are millions of rows.
In this case, my file has 236,905
rows, but exactly 100,000
are loaded.
df = pd.read_csv(r'C:\Users\user\Downloads\df.csv',nrows=9999999,low_memory=False)
Upvotes: 3
Views: 280
Reputation: 423
You need to create chunks using the chunksize=
parameter:
temporary = pd.read_csv(r'C:\Users\user\Downloads\df.csv', iterator=True, chunksize=1000)
df = pd.concat(temporary, ignore_index=True)
ignore_index
resets the index so it's not repeating.
Upvotes: 0
Reputation: 31
I come across the same problem with a file containing 5M rows.
I tried first this option :
tp = pd.read_csv('yourfile.csv', iterator=True, chunksize=1000)
data_customers = pd.concat(tp, ignore_index=True)
It did work but in my case some rows where not read properly since some columns contained the character ',' which is used as delimiter in read_csv
The other solution is to use Dask It has an object called "DataFrame" (as Pandas). Dask reads your file and construct a dask dataframe composed of several pandas dataframe. It's a great solution for parallel computing.
Hope it helps
Upvotes: 2