Reputation: 409
I have a dataset 'df' like so:
a b c e f g
1 Nan Nan 2 3 4
Nan 1 Nan 2 3 4
Nan Nan 1 2 3 4
I want to create a new column 'd' in my dataframe 'df' like so:
a b c d e f g
1 Nan Nan 1 2 3 4
Nan 1 Nan 1 2 3 4
Nan Nan 1 1 2 3 4
So basically, 'd' is the amalgamation of columns 'a', 'b', and 'c' for any non-NaN values those three columns have. Between 'a', 'b', and 'c', if one column is non-NaN, the other columns are NaN. I need columns 'e', 'f', and 'g' in my dataset, but they do not have anything to do with constructing column 'd'
I created this function below:
def func(df):
if df['a'] is not None:
return df['a']
if df['b'] is not None:
return df['b']
if df['c'] is not None:
return df['c']
and then used concatenated_df['d']=concatenated_df.apply(func, axis=1)
to apply the conditional statements for each row of the dataframe. However, I am only getting the first if statement to successfully execute. The rest returns NaN values to the new column 'd'
I also tried modifying the function to
def func(df):
if df['a'] is not None:
return df['a']
elif df['b'] is not None:
return df['b']
elif df['c'] is not None:
return df['c']
but it still only executed the first conditional statement
Upvotes: 0
Views: 188
Reputation: 57033
One more option not mentioned in the duplicated question:
df['d'] = df['a'].combine_first(df['b']).combine_first(df['c'])
#0 1.0
#1 1.0
#2 1.0
Upvotes: 1
Reputation: 51165
You can use first_valid_index
df['d'] = df.apply(lambda x: x[x.first_valid_index()], axis=1)
a b c d
0 1 NaN NaN 1
1 NaN 1 NaN 1
2 NaN NaN 1 1
Upvotes: 2