Reduce multiple columns into one using pandas

Question

I have several columns in a DataFrame that I would like to combine into one column:

from functools import reduce # python 3.x
na=pd.np.nan
df1=pd.DataFrame({'a':[na,'B',na],'b':['A',na,na],'c':[na,na,'C']})
print(df1)
     a    b    c
0  NaN    A  NaN
1    B  NaN  NaN
2  NaN  NaN    C

The output I am trying to get is supposed to look like (column name doesn't matter):

  a
0 A
1 B
2 C

I get ValueError: cannot index with vector containing NA / NaN values when I run this line of code:

reduce(lambda c1,c2: df1[c1].fillna(df1[c2]),df1.loc[:,'a':'c'])

However, it seems to work when I change the sequence argument of reduce to just two columns df1.loc[:,'a':'b']:

reduce(lambda c1,c2: df1[c1].fillna(df1[c2]),df1.loc[:,'a':'b'])
0      A
1      B
2    NaN
Name: a, dtype: object

I've also tried to use the DataFrame/Series .combine method, but that produces the same error. I would like to try to get this working in case I ever want to fill non-nan values:

reduce(lambda c1,c2: df1[c1].combine(df1[c2],(lambda x,y: y if x==pd.np.nan else x)),df1.loc[:,'a':'c'])

I don't think this is working like I am hoping though, because when I again restrict to just two columns I get this output:

reduce(lambda c1,c2: df1[c1].combine(df1[c2],(lambda x,y: y if x==pd.np.nan else x)),df1.loc[:,'a':'b'])
0    NaN
1      B
2    NaN
dtype: object

Vaishali · Accepted Answer

One way is to use sum over axis 1

df1.fillna('').sum(1)

0    A
1    B
2    C

Option2: use bfill and pick the first column

df1.bfill(axis = 1).iloc[:, 0]

Reduce multiple columns into one using pandas

Answers (2)

Related Questions