Reputation: 879
I have a data frame like this
----------------
RecID| A |B
----------------
1 |NaN | x
2 |y | NaN
3 |z | NaN
4 |NaN | a
5 |NaN | b
And I want to create a new column, C, from A and B such that if A is null then fill with B and if B is null then fill with A:
----------------------
RecID|A |B |C
----------------------
1 |NaN | x |x
2 |y | NaN |y
3 |z | NaN |z
4 |NaN | a |a
5 |NaN | b |b
Lastly, is there an efficient way to do this if I have more than two columns, e.g. I have columns A-Z and want create a new column A1 out of columns A-Z similar to above?
Upvotes: 6
Views: 9474
Reputation: 294218
pandas
lookup
This is the generalizable solution OP was looking for and will work across an arbitrary number of columns.
lookup = df.loc[:, 'A':'B'].notnull().idxmax(1)
df.assign(A1=df.lookup(lookup.index, lookup.values))
RecID A B A1
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
fillna
df.assign(C=df.A.fillna(df.B))
RecID A B C
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
mask
df.assign(C=df.A.mask(df.A.isnull(), df.B))
RecID A B C
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
combine_first
df.assign(C=df.A.combine_first(df.B))
RecID A B C
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
numpy
np.where
df.assign(C=np.where(df.A.notnull(), df.A, df.B))
RecID A B C
0 1 NaN x x
1 2 y NaN y
2 3 z NaN z
3 4 NaN a a
4 5 NaN b b
Upvotes: 13
Reputation: 57033
In the case of multiple columns, you can use forward fill. This example assumes that you want to build a combination of all columns 'A' through 'Z':
df['AZ'] = df.loc[:,'A':'Z'].fillna(method='ffill',axis=1)['Z']
This method works for two columns, too:
df['C'] = df.loc[:,'A':'B'].fillna(method='ffill',axis=1)['B']
#0 x
#1 y
#2 z
#3 a
#4 b
Upvotes: 7