Vid
Vid

Reputation: 141

Pandas read_csv Multiple spaces delimiter

I have a file with 7 aligned columns, with empty cells.Example:

SN 1995ap 0.230  40.44 0.46  0.00 silver 
SN 1995ao 0.300  40.76 0.60  0.00 silver 
SN 1995ae 0.067  37.54 0.34  0.00 silver 
SN 1995az 0.450  42.13 0.21       gold   
SN 1995ay 0.480  42.37 0.20       gold   
SN 1995ax 0.615  42.85 0.23       gold 

I want to read it using pandas.read_csv(), but I have some trouble. The separator can be either 1 or 2 spaces. If I use sep='\s+' it works, but it ignores empty cells, therefore I get cells shifted to the left and empty cells in the last columns. I tried to use regex separator sep=\s{1,2}, but i get the following error:

pandas.errors.ParserError: Expected 7 fields in line 63, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

My code:

import pandas as pd

riess_2004b=pd.read_csv('Riess_2004b.txt', skiprows=22, header=None, sep='\s{1,2}', engine='python')

What I am not getting right?

Upvotes: 1

Views: 2148

Answers (2)

Code Different
Code Different

Reputation: 93161

Fix-width file (read_fwf) seems like a better fit for your case:

df = pd.read_fwf("Riess_2004b.txt", colspecs="infer", header=None)

Upvotes: 3

Ynjxsjmh
Ynjxsjmh

Reputation: 30032

If there is no extra spaces in your field value and no continuous empty values in one row, you can try delim_whitespace argument and then shift the NAN part to left by one column.

df = pd.read_csv('xx', delim_whitespace=True)


def shift(col):
    m = col.isna().shift(-1, fill_value=False)
    col = col.fillna(method='ffill')
    col[m] = pd.NA
    return col

df = df.T.apply(shift, axis=0).T
print(df)

   SN  1995ap  0.230  40.44  0.46  0.00  silver
0  SN  1995ao    0.3  40.76   0.6  0.00  silver
1  SN  1995ae  0.067  37.54  0.34  0.00  silver
2  SN  1995az   0.45  42.13  0.21  <NA>    gold
3  SN  1995ay   0.48  42.37   0.2  <NA>    gold
4  SN  1995ax  0.615  42.85  0.23  <NA>    gold

Upvotes: 0

Related Questions