Sang-il Ahn
Sang-il Ahn

Reputation: 103

How to preserve leading white spaces in pandas Series in python?

I am trying to read a text file through read_csv of pandas in python. My text file looks like (all values in numbers):

 35 61  7 1 0              # with leading white spaces
  0 1 1 1 1 1              # with leading white spaces
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
1(01-02),2(02-03),3(03-04) # this line cause 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

my python code is as follows:

import pandas as pd
df = pd.read_csv('example.txt', header=None)
df

The output is like:

CParserError: 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

Before dealing with leading white spaces, I need to handle an 'Error tokenizing data.' issue first. So I changed code like:

import pandas as pd
df = pd.read_csv('example.txt', header=None, error_bad_lines=False)
df

I can get data with leading white spaces as I intended, but data in line 5 has gone. Output is as follows:

b'Skipping line 5: expected 1 fields, saw 3\n
 35 61  7 1 0              # with leading white spaces as intended
  0 1 1 1 1 1              # with leading white spaces as intended
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
                           # 5th line disappeared (not my intention).

So I tried to change my code below to get 5th line.

import pandas as pd
df = pd.read_csv('example.txt', header=None, sep=':::', engine='python')
df

I successfully got data in line 5 but leading white spaces in line 1 and 2 has gone as follows:

35 61  7 1 0               # without leading white spaces(not my intention)
0 1 1 1 1 1                # without leading white spaces(not my intention)
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
1(01-02),2(02-03),3(03-04) # I successfully got this line as intended.

I saw several posts on preserving leading spaces with string but I can't find cases to preserve leading white spaces with numbers. Thanks for your help.

Upvotes: 3

Views: 915

Answers (1)

cs95
cs95

Reputation: 402872

The key is in the separator. If you specify sep to be the regex ^ start-of-line metacharacter, this works.

s = pd.read_csv('example.txt', header=None, sep='^', squeeze=True)

s

0                  35 61  7 1 0
1                   0 1 1 1 1 1
2                 33 221 22 0 1
3                       233   2
4    1(01-02),2(02-03),3(03-04)
Name: 0, dtype: object

s[1]
'  0 1 1 1 1 1'

Upvotes: 3

Related Questions