Reputation: 1719
a,b,c,d,e,f
1.5,4.8,,6.3
1.60,5.2,6.5,7.2
1.70,5.5,6.6,8.3,5.7
1.80,6.1,6.7,9.7,6.2
1.90,7.1,6.8,11.1,6.7
2,,6.8,12.5,7.3
2.08,,,,7.8
2.1,,7.2
2.2,,8.0
2.3,,8.7
2.4,,9.2,8.2
from pandas import read_csv
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
Traceback (most recent call last):
File "read_lin.py", line 7, in <module>
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 253, in read_csv
return _read(TextParser, filepath_or_buffer, kdict)
File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 202, in _read
return parser.get_chunk()
File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 844, in get_chunk
alldata = self._rows_to_cols(content)
File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 809, in _rows_to_cols
raise ValueError(msg)
ValueError: Expecting 6 columns, got 5 in row 1
Upvotes: 0
Views: 5492
Reputation: 587
You can use the error_bad_lines=False
option of the read_csv
function. It will automatically skip the badly formatted lines and print them.
Upvotes: 1
Reputation: 375535
The problem is that you don't have any columns of length 6 (the longest is 5), I don't think there is a keyword in read_csv
to overcome this.
One solution is to be more explicit:
In [1]: df = pd.read_csv('lin-nan.dat', names=list('abcde'), index_col=0, skiprows=1)
In [2]: df['f'] = np.nan
In [3]: df
Out[3]:
b c d e f
a
1.50 4.8 NaN 6.3 NaN NaN
1.60 5.2 6.5 7.2 NaN NaN
1.70 5.5 6.6 8.3 5.7 NaN
1.80 6.1 6.7 9.7 6.2 NaN
1.90 7.1 6.8 11.1 6.7 NaN
2.00 NaN 6.8 12.5 7.3 NaN
2.08 NaN NaN NaN 7.8 NaN
2.10 NaN 7.2 NaN NaN NaN
2.20 NaN 8.0 NaN NaN NaN
2.30 NaN 8.7 NaN NaN NaN
2.40 NaN 9.2 8.2 NaN NaN
Upvotes: 0