Long Ye
Long Ye

Reputation: 147

TypeError when using chunksize argument to pandas method pd.read_csv()

I have a csv file like this:

   1  1.1  0      0.1  13.1494  32.7957  2.27266  0.2  3  5.4   ...     \
0  2    2  0  8.17680  4.76726  25.6957  1.13633    0  3  4.8   ...      
1  3    0  0  8.22718  2.35340  15.2934  1.13633    0  3  4.8   ...

read the file using panda.read_csv:

data_raw = pd.read_csv(filename, chunksize=chunksize)

Now, I want to make a dataframe:

df = pd.DataFrame(data_raw, columns=['id', 'colNam1', 'colNam2', 'colNam3',...])

But I met a problem:

  File "test.py", line 143, in <module>
    data = load_frame(csvfile)
  File "test.py", line 53, in load_frame
    'id', 'colNam1', 'colNam2', 'colNam3',...])
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 325, in __init__
    raise TypeError("data argument can't be an iterator")
TypeError: data argument can't be an iterator

I don't know why.

Upvotes: 0

Views: 1169

Answers (1)

EdChum
EdChum

Reputation: 394459

This is because what is returned when you pass chunksize as a param to read_csv is an iterable rather than a df as such.

To demonstrate:

In [67]:
import io
import pandas as pd
t="""a         b
0 -0.278303 -1.625377
1 -1.954218  0.843397
2  1.213572 -0.098594"""
df = pd.read_csv(io.StringIO(t), chunksize=1)
df

Out[67]:
<pandas.io.parsers.TextFileReader at 0x7e9e8d0>

You can see that the df here is in this case not a DataFrame but a TextFileReader object

It's unclear to me what you're really trying to achieve but if you want to read a specific number of rows you can pass nrows instead:

In [69]:
t="""a         b
0 -0.278303 -1.625377
1 -1.954218  0.843397
2  1.213572 -0.098594"""
df = pd.read_csv(io.StringIO(t), nrows=1)
df

Out[69]:
             a         b
0  0 -0.278303 -1.625377

The idea here with your original problem is that you need to iterate over it in order to get the chunks:

In [73]:
for r in df:
    print(r)

             a         b
0  0 -0.278303 -1.625377
             a         b
1  1 -1.954218  0.843397
             a         b
2  2  1.213572 -0.098594

If you want to generate a df from the chunks you need to append to a list and then call concat:

In [77]:
df_list=[]
for r in df:
    df_list.append(r)
pd.concat(df_list)

Out[77]:
             a         b
0  0 -0.278303 -1.625377
1  1 -1.954218  0.843397
2  2  1.213572 -0.098594

Upvotes: 1

Related Questions