Reputation: 103
I am trying to read a text file through read_csv of pandas in python. My text file looks like (all values in numbers):
35 61 7 1 0 # with leading white spaces
0 1 1 1 1 1 # with leading white spaces
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
1(01-02),2(02-03),3(03-04) # this line cause 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
my python code is as follows:
import pandas as pd
df = pd.read_csv('example.txt', header=None)
df
The output is like:
CParserError: 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3
Before dealing with leading white spaces, I need to handle an 'Error tokenizing data.' issue first. So I changed code like:
import pandas as pd
df = pd.read_csv('example.txt', header=None, error_bad_lines=False)
df
I can get data with leading white spaces as I intended, but data in line 5 has gone. Output is as follows:
b'Skipping line 5: expected 1 fields, saw 3\n
35 61 7 1 0 # with leading white spaces as intended
0 1 1 1 1 1 # with leading white spaces as intended
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
# 5th line disappeared (not my intention).
So I tried to change my code below to get 5th line.
import pandas as pd
df = pd.read_csv('example.txt', header=None, sep=':::', engine='python')
df
I successfully got data in line 5 but leading white spaces in line 1 and 2 has gone as follows:
35 61 7 1 0 # without leading white spaces(not my intention)
0 1 1 1 1 1 # without leading white spaces(not my intention)
33 221 22 0 1 # without leading white spaces
233 2 # without leading white spaces
1(01-02),2(02-03),3(03-04) # I successfully got this line as intended.
I saw several posts on preserving leading spaces with string but I can't find cases to preserve leading white spaces with numbers. Thanks for your help.
Upvotes: 3
Views: 915
Reputation: 402872
The key is in the separator. If you specify sep
to be the regex ^
start-of-line metacharacter, this works.
s = pd.read_csv('example.txt', header=None, sep='^', squeeze=True)
s
0 35 61 7 1 0
1 0 1 1 1 1 1
2 33 221 22 0 1
3 233 2
4 1(01-02),2(02-03),3(03-04)
Name: 0, dtype: object
s[1]
' 0 1 1 1 1 1'
Upvotes: 3