PagMax
PagMax

Reputation: 8568

Combine pandas column into new column in certain order of preference

This is largely a follow up question from this one.

Combine Pandas data frame column values into new column

I want to combine certain columns (not all) of a dateframe into a single column and pick which data to use in certain order of preference and ignore rest.

For instance I have this

df=pd.DataFrame({'A': ['alpha','beta', np.nan, np.nan],
'B':['alpha_1',np.nan, np.nan, 'delta_1'],
'C': [np.nan,'beta_2', 'gamma_2', 'delta_2']})

which gives

              A        B        C
        0  alpha  alpha_1      NaN
        1   beta      NaN   beta_2
        2    NaN      NaN  gamma_2
        3    NaN  delta_1  delta_2

I want to combine them such that it will first prefer value in column A and if that is NaN or null, it will look for value in column B and if not finally in column C.
so the final output in Column D will be like

              A        B        C        D
        0  alpha  alpha_1      NaN       alpha
        1   beta      NaN    beta_2      beta
        2    NaN      NaN    gamma_2     gamma_2
        3   NaN     delta_1  delta_2     delta_1

I can do df.fillna('').sum(axis=1) as suggested in the solution but I am thinking before that I should make sure every row has only non NaN value.

I am trying with boolean masking using isnull() on each column but not able to progress too far on that. Any help would be great.

Upvotes: 3

Views: 303

Answers (2)

jezrael
jezrael

Reputation: 862611

Solution with bfill what is same as fillna with method='bfill'

df['D'] = df.bfill(axis=1).iloc[:, 0]
print (df)
       A        B        C        D
0  alpha  alpha_1      NaN    alpha
1   beta      NaN   beta_2     beta
2    NaN      NaN  gamma_2  gamma_2
3    NaN  delta_1  delta_2  delta_1

df['D'] = df.fillna(method='bfill',axis=1).iloc[:, 0]
print (df)
       A        B        C        D
0  alpha  alpha_1      NaN    alpha
1   beta      NaN   beta_2     beta
2    NaN      NaN  gamma_2  gamma_2
3    NaN  delta_1  delta_2  delta_1

Another solution with first_valid_index and apply:

df['D'] = df.apply(lambda x: x[x.first_valid_index()], axis=1)
print (df)
       A        B        C        D
0  alpha  alpha_1      NaN    alpha
1   beta      NaN   beta_2     beta
2    NaN      NaN  gamma_2  gamma_2
3    NaN  delta_1  delta_2  delta_1

Upvotes: 4

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

In [338]: df['D'] = df.T.bfill().iloc[0]

In [339]: df
Out[339]:
       A        B        C        D
0  alpha  alpha_1      NaN    alpha
1   beta      NaN   beta_2     beta
2    NaN      NaN  gamma_2  gamma_2
3    NaN  delta_1  delta_2  delta_1

Upvotes: 3

Related Questions