Pandas read_csv strange behaviour

Question

Please try to understand the reason of the following read_csv behaviour: I am trying to read a huge file in chunks

c=1
for chunk in pd.read_csv(filename, chunksize=chunksize):
   print 'chunk ', str(c), ' started'
   ....data normalization....
   ....saving the transformed data to file....

I get an error like this:

sys:1: DtypeWarning: Columns (...) have mixed types. Specify dtype option on import or set low_memory=False.
chunk  19  started
Traceback (most recent call last):
...
TypeError: unsupported operand type(s) for -: 'str' and 'float'

from the error I can see, that for some reason at chunk 19 pandas interpreted the float data as string, and cannot perform '-' operation.

However, if I skip 18 chunks, and start from chunk 19 it goes well. Intuition says it might be some memory problem, but I would like understand the reason.

Pandas read_csv strange behaviour

Answers (1)

Related Questions