Reputation: 18692
I have several columns in a DataFrame that I would like to combine into one column:
from functools import reduce # python 3.x
na=pd.np.nan
df1=pd.DataFrame({'a':[na,'B',na],'b':['A',na,na],'c':[na,na,'C']})
print(df1)
a b c
0 NaN A NaN
1 B NaN NaN
2 NaN NaN C
The output I am trying to get is supposed to look like (column name doesn't matter):
a
0 A
1 B
2 C
I get ValueError: cannot index with vector containing NA / NaN values
when I run this line of code:
reduce(lambda c1,c2: df1[c1].fillna(df1[c2]),df1.loc[:,'a':'c'])
However, it seems to work when I change the sequence
argument of reduce
to just two columns df1.loc[:,'a':'b']
:
reduce(lambda c1,c2: df1[c1].fillna(df1[c2]),df1.loc[:,'a':'b'])
0 A
1 B
2 NaN
Name: a, dtype: object
I've also tried to use the DataFrame/Series .combine
method, but that produces the same error. I would like to try to get this working in case I ever want to fill non-nan values:
reduce(lambda c1,c2: df1[c1].combine(df1[c2],(lambda x,y: y if x==pd.np.nan else x)),df1.loc[:,'a':'c'])
I don't think this is working like I am hoping though, because when I again restrict to just two columns I get this output:
reduce(lambda c1,c2: df1[c1].combine(df1[c2],(lambda x,y: y if x==pd.np.nan else x)),df1.loc[:,'a':'b'])
0 NaN
1 B
2 NaN
dtype: object
Upvotes: 1
Views: 2407
Reputation: 38415
One way is to use sum over axis 1
df1.fillna('').sum(1)
0 A
1 B
2 C
Option2: use bfill and pick the first column
df1.bfill(axis = 1).iloc[:, 0]
Upvotes: 2
Reputation: 6396
this also works :
pd.DataFrame(data=df.stack().values, index=df.index, columns=['a'])
Results :
a
0 A
1 B
2 C
Upvotes: 0