Reputation: 696
I'm trying to import the contents of a text file into a pandas dataframe from a web page.
However, when I try to import using below code and try to print the column names, I get the below error.
import pandas as pd
df = pd.read_csv(
"http://cs.joensuu.fi/sipu/datasets/s1.txt",
index_col=None,
sep=" "
)
Which results in the error below:
File "/Users/user/Desktop/Folder/Src/spiral.py", line 8, in <module>
df = pd.read_csv('http://cs.joensuu.fi/sipu/datasets/s1.txt', index_col=None, sep=" ")
File "/Users/user/miniforge3/envs/test_venv/lib/python3.8/site-packages/pandas/io/parsers.py", line 610, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Users/user/miniforge3/envs/test_venv/lib/python3.8/site-packages/pandas/io/parsers.py", line 468, in _read
return parser.read(nrows)
File "/Users/user/miniforge3/envs/test_venv/lib/python3.8/site-packages/pandas/io/parsers.py", line 1057, in read
index, columns, col_dict = self._engine.read(nrows)
File "/Users/user/miniforge3/envs/test_venv/lib/python3.8/site-packages/pandas/io/parsers.py", line 2061, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 756, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 771, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 827, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 1951, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 1334, saw 10
How can I import the text file from the above given URL into a pandas dataframe as separate columns ?
Upvotes: 1
Views: 397
Reputation: 13478
This error message means that one line has not the expected number of columns (10 instead of 9).
As stated in Pandas documentation for read.csv() method, you could choose to skip the faulty row by setting on_bad_lines
to "skip", like this:
import pandas as pd
df = pd.read_csv(
"http://cs.joensuu.fi/sipu/datasets/s1.txt",
index_col=None,
sep=" ",
on_bad_lines="skip",
)
print(df)
# Outputs
Unnamed: 0 Unnamed: 1 Unnamed: 2 ... Unnamed: 6 Unnamed: 7 550946
0 NaN NaN NaN ... NaN NaN 557965
1 NaN NaN NaN ... NaN NaN 575538
2 NaN NaN NaN ... NaN NaN 551446
3 NaN NaN NaN ... NaN NaN 608046
4 NaN NaN NaN ... NaN NaN 557588
... ... ... ... ... ... ... ...
4921 NaN NaN NaN ... NaN NaN 853940
4922 NaN NaN NaN ... NaN NaN 863963
4923 NaN NaN NaN ... NaN NaN 861267
4924 NaN NaN NaN ... NaN NaN 858702
4925 NaN NaN NaN ... NaN NaN 842566
Upvotes: 1