BENY
BENY

Reputation: 323356

ffill weird behavior , when have the duplicate columns names

I have a DataFrame as below


df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])
df.columns=['A','A']

Now I want to ffill the values groupby the index , first I try

df.groupby(level=0).ffill()

Which returns the error code

> ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

It looks like a bug, then I am trying with apply, which returns the expected output.

df.groupby(level=0).apply(lambda x : x.ffill())
     A    A
1  NaN  2.0
1  1.0  2.0
2  1.0  2.0
2  1.0  2.0

For your reference when the columns is unique , it works just(Q2) fine, however, create one index columns and columns name is NaN

df.columns=['C','D']
df.groupby(level=0).ffill()
   NaN    C    D
1    1  NaN  2.0
1    1  1.0  2.0
2    2  1.0  2.0
2    2  1.0  2.0

Question :
1 Is this a bug ? why apply can still work with this type situation ?

2 why groupby with index and ffill, it creates the additional columns ?

Upvotes: 14

Views: 355

Answers (1)

fpersyn
fpersyn

Reputation: 1096

It sure looks bugged. Just wanted to note that according to the pandas documentation the .ffill() method is a synonym for .fillna(method='ffill'). Using the latter generates your expected output for both your examples in pandas version 0.23.4 without any errors or additional columns. Hope that helps.

import pandas as pd
import numpy as np
df=pd.DataFrame({'A':[np.nan,1,1,np.nan],'B':[2,np.nan,2,2]},index=[1,1,2,2])

df.columns=['A','A'] #dup column names
df.groupby(level=0).fillna(method='ffill')

Output:
    A   A
1   NaN 2.0
1   1.0 2.0
2   1.0 2.0
2   1.0 2.0

Upvotes: 1

Related Questions