swang16
swang16

Reputation: 879

Python Pandas: Create new column out of other columns where value is not null

I have a data frame like this

----------------
RecID| A  |B
----------------
1    |NaN | x 
2    |y   | NaN 
3    |z   | NaN
4    |NaN | a 
5    |NaN | b 

And I want to create a new column, C, from A and B such that if A is null then fill with B and if B is null then fill with A:

----------------------
RecID|A   |B    |C 
----------------------
1    |NaN | x   |x
2    |y   | NaN |y 
3    |z   | NaN |z
4    |NaN | a   |a
5    |NaN | b   |b

Lastly, is there an efficient way to do this if I have more than two columns, e.g. I have columns A-Z and want create a new column A1 out of columns A-Z similar to above?

Upvotes: 6

Views: 9474

Answers (2)

piRSquared
piRSquared

Reputation: 294218

pandas
lookup
This is the generalizable solution OP was looking for and will work across an arbitrary number of columns.

lookup = df.loc[:, 'A':'B'].notnull().idxmax(1)
df.assign(A1=df.lookup(lookup.index, lookup.values))

   RecID    A    B A1
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

fillna

df.assign(C=df.A.fillna(df.B))

   RecID    A    B  C
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

mask

df.assign(C=df.A.mask(df.A.isnull(), df.B))

   RecID    A    B  C
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

combine_first

df.assign(C=df.A.combine_first(df.B))

   RecID    A    B  C
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

numpy
np.where

df.assign(C=np.where(df.A.notnull(), df.A, df.B))

   RecID    A    B  C
0      1  NaN    x  x
1      2    y  NaN  y
2      3    z  NaN  z
3      4  NaN    a  a
4      5  NaN    b  b

Upvotes: 13

DYZ
DYZ

Reputation: 57033

In the case of multiple columns, you can use forward fill. This example assumes that you want to build a combination of all columns 'A' through 'Z':

df['AZ'] = df.loc[:,'A':'Z'].fillna(method='ffill',axis=1)['Z']

This method works for two columns, too:

df['C'] = df.loc[:,'A':'B'].fillna(method='ffill',axis=1)['B']
#0    x
#1    y
#2    z
#3    a
#4    b

Upvotes: 7

Related Questions