Ted Mcgroder
Ted Mcgroder

Reputation: 55

Replacing an unknown number in Pandas data frame with previous number

I have some data frames I am trying to upload to a database. They are lists of values but some of the columns have the string 'null' in them and so this is causing errors.

so I would like to use a function to remove these 'null' strings and am trying to use replace to back fill them below:

df.replace("null", method = bfill)

but it is giving me the error message:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

I have also tried putting "bfill" instead and it just replaced "null" with the string "bfill."

Any help appreciated.

Thanks.

Sorry should have provided an example:

1     6     11
2     7     12
null  null  null
4     9     14
5     10    15

Upvotes: 2

Views: 1185

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Borrowing @jezrael's sample data set:

In [11]: df[df.ne('null')].bfill().ffill()
Out[11]:
   A  B  C
0  k  t  r
1  n  f  t
2  n  f  s
3  m  f  s
4  m  s  s
5  m  s  s

Upvotes: 1

jezrael
jezrael

Reputation: 862571

I think you need replace strings null to NaNs and then call bfill (fillna with method='bfill') and if some NaNs in the end of data add ffill for forward filling:

df = df.replace("null",np.nan).bfill().ffill()

But your error is obviously in read_csv function, check line 4 - parser need only one value and for some reason there are 2 values.

Sample:

df = pd.DataFrame({'A':['k','null','n','null','null','m'],
                   'B':['t','null','null','f','null','s'],
                   'C':['r','t','null','s','null','null']})

print (df)
      A     B     C
0     k     t     r
1  null  null     t
2     n  null  null
3  null     f     s
4  null  null  null
5     m     s  null

print (df.replace("null",np.nan))
     A    B    C
0    k    t    r
1  NaN  NaN    t
2    n  NaN  NaN
3  NaN    f    s
4  NaN  NaN  NaN
5    m    s  NaN
df1 = df.replace("null",np.nan).bfill()
print (df1)
   A  B    C
0  k  t    r
1  n  f    t
2  n  f    s
3  m  f    s
4  m  s  NaN
5  m  s  NaN

#if some `NaN`s in last row is necessary `ffill`
df2 = df.replace("null",np.nan).bfill().ffill()
print (df2)
   A  B  C
0  k  t  r
1  n  f  t
2  n  f  s
3  m  f  s
4  m  s  s
5  m  s  s

Upvotes: 2

Related Questions