luc
luc

Reputation: 83

read_csv stops at 100000

I am trying to import a .csv file from my Downloads folder. Usually, the read_csv function will import the entire rows, though there are millions of rows. In this case, my file has 236,905 rows, but exactly 100,000 are loaded.

df = pd.read_csv(r'C:\Users\user\Downloads\df.csv',nrows=9999999,low_memory=False)

Upvotes: 3

Views: 280

Answers (2)

Pythoneer
Pythoneer

Reputation: 423

You need to create chunks using the chunksize= parameter:

temporary = pd.read_csv(r'C:\Users\user\Downloads\df.csv', iterator=True, chunksize=1000)
df = pd.concat(temporary, ignore_index=True)

ignore_index resets the index so it's not repeating.

Upvotes: 0

Walid
Walid

Reputation: 31

I come across the same problem with a file containing 5M rows.

I tried first this option :

 tp = pd.read_csv('yourfile.csv', iterator=True, chunksize=1000)
 data_customers = pd.concat(tp, ignore_index=True)

It did work but in my case some rows where not read properly since some columns contained the character ',' which is used as delimiter in read_csv

The other solution is to use Dask It has an object called "DataFrame" (as Pandas). Dask reads your file and construct a dask dataframe composed of several pandas dataframe. It's a great solution for parallel computing.

Hope it helps

Upvotes: 2

Related Questions