user16486968
user16486968

Reputation:

Formatting the columns when reading text file using pandas

I want to read a fixed width file.txt using pandas like this :

option19971675181       ACHILLE BLA BLA BLA1                                 blabla    88   498
option19971675182       ACHILLE BLA BLA BLA1                                  blabla   176   498
option19971675183       ACHILLE BLA BLA BLA1                                  blabla   191   498
option19971675184       ACHILLE BLA BLA BLA1                                  blabla   521   498
option19971675185       ACHILLE BLA BLA BLA1                                  blabla   919   498
option19971675186       ACHILLE BLA BLA BLA134234531                          blabla    10    498
option19971675187       ACHILLE BLA BLA BLA134234531 7 65                     blabla     0     0
option19971675188       ACHILLE BLA BLA BLA1342 90345 31                      blabla     0     0
option19971675189      ACHILLE BLA BLA BLA 134 23N 094 87OP531                blabla     0     0
option19971675190       ACHILLE BLA BLA BLA 134 23N 094 87OP53                blabla     0     0

I tried to read the file into pandas. The file has values separated by space

But i dont know how can i separate the text option199716751810 into 2 columns.

I used the code from the answer it work but not for the first line

 df = pd.read_csv("test.txt", delimiter ="\s\s+", header = None,error_bad_lines=False)
df[df.columns[0]] = df[df.columns[0]].str.replace("option199716","")

>>> df

I got this output

75181       ACHILLE BLA BLA BLA1                                 blabla    88   498
75182       ACHILLE BLA BLA BLA1                                  blabla   176   498
75183       ACHILLE BLA BLA BLA1                                  blabla   191   498
75184       ACHILLE BLA BLA BLA1                                  blabla   521   498
75185       ACHILLE BLA BLA BLA1                                  blabla   919   498
75186       ACHILLE BLA BLA BLA134234531                          blabla    10    498
75187       ACHILLE BLA BLA BLA134234531 7 65                     blabla     0     0
75188       ACHILLE BLA BLA BLA1342 90345 31                      blabla     0     0
75189      ACHILLE BLA BLA BLA 134 23N 094 87OP531                blabla     0     0
75190       ACHILLE BLA BLA BLA 134 23N 094 87OP53                blabla     0     0

But it still show error : Skipping line 16: Expected 5 fields in line 136, saw 6. Error could possibly be due to quotes being ignored when a multi-char delimiter is used. Can someone help to get this plz

Upvotes: 2

Views: 247

Answers (1)

Punker
Punker

Reputation: 1878

I would suggest you using str.slice() method (let's say df is your dataframe and option_column contains your option strings):

df['col A'] = df['option_column'].str.slice(stop=12)
df['col B'] = df['option_column'].str.slice(start=12)

Upvotes: 2

Related Questions