nbecker
nbecker

Reputation: 1719

missing data in pandas read_csv

my data:


a,b,c,d,e,f
1.5,4.8,,6.3
1.60,5.2,6.5,7.2
1.70,5.5,6.6,8.3,5.7
1.80,6.1,6.7,9.7,6.2
1.90,7.1,6.8,11.1,6.7
2,,6.8,12.5,7.3
2.08,,,,7.8
2.1,,7.2
2.2,,8.0
2.3,,8.7
2.4,,9.2,8.2

from pandas import read_csv
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')

Traceback (most recent call last):
  File "read_lin.py", line 7, in <module>
    ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 253, in read_csv
    return _read(TextParser, filepath_or_buffer, kdict)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 202, in _read
    return parser.get_chunk()
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 844, in get_chunk
    alldata = self._rows_to_cols(content)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 809, in _rows_to_cols
    raise ValueError(msg)
ValueError: Expecting 6 columns, got 5 in row 1

Upvotes: 0

Views: 5492

Answers (2)

sebastibe
sebastibe

Reputation: 587

You can use the error_bad_lines=False option of the read_csv function. It will automatically skip the badly formatted lines and print them.

Upvotes: 1

Andy Hayden
Andy Hayden

Reputation: 375535

The problem is that you don't have any columns of length 6 (the longest is 5), I don't think there is a keyword in read_csv to overcome this.

One solution is to be more explicit:

In [1]: df = pd.read_csv('lin-nan.dat', names=list('abcde'), index_col=0, skiprows=1)

In [2]: df['f'] = np.nan

In [3]: df
Out[3]: 
        b    c     d    e   f
a                            
1.50  4.8  NaN   6.3  NaN NaN
1.60  5.2  6.5   7.2  NaN NaN
1.70  5.5  6.6   8.3  5.7 NaN
1.80  6.1  6.7   9.7  6.2 NaN
1.90  7.1  6.8  11.1  6.7 NaN
2.00  NaN  6.8  12.5  7.3 NaN
2.08  NaN  NaN   NaN  7.8 NaN
2.10  NaN  7.2   NaN  NaN NaN
2.20  NaN  8.0   NaN  NaN NaN
2.30  NaN  8.7   NaN  NaN NaN
2.40  NaN  9.2   8.2  NaN NaN

Upvotes: 0

Related Questions