Reputation: 31
For example, a row in the data looks like this
-1 0:183.3575741549828 1:3.11164735151736 2:2.171277907851733 3:26.68849990272964 4:24.76677388937082 5:0.02710337995527495
The reason why index is specified is because attributes for which index is not specified are assumed to be zero.
I'm trying to use the statement:
train = pd.read_csv('train.csv', header=None, delim_whitespace=True).values
It is showing the following error:
train = pd.read_csv('train.csv', header=None, delim_whitespace=True).values
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 646, in parser_f return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 401, in _read data = parser.read()
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 939, in read ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 1508, in read data = self._reader.read(nrows)
File "pandas/parser.pyx", line 848, in pandas.parser.TextReader.read (pandas/parser.c:10415)
File "pandas/parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10691)
File "pandas/parser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandas/parser.c:11437)
File "pandas/parser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:11308)
File "pandas/parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas/parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 132 fields in line 5, saw 143
I can't seem to figure out the problem here. Any help would be great!
Upvotes: 0
Views: 286
Reputation: 886
Based on your data description and the error message my guess is that the rows in your csv file do not have the same amount of fields per row. Try specifying the field columns:
my_cols = range(0,4125)
train = pd.read_csv('train.csv', header=None, delim_whitespace=True, names=my_cols).values
Find more help here: import csv with different number of columns per row using Pandas and here: Handling Variable Number of Columns with Pandas - Python
Upvotes: 0