Reputation: 475
I have a dataframe structured as such:
I am wondering what the most efficient way is in pandas to create a new column "stage" that extracts any value that isn't 'None' in the four columns and use that value for the 'stage' column. The remaining four columns can then be dropped after the stage column has extracted out any value that isn't None in each row.
Here is another snapshot of the unique values of each column involved:
Please note that the values in the columns in question are string type and None isn't actually Nonetype.
Upvotes: 0
Views: 3952
Reputation: 107767
Consider combine_first
, assuming None is not a string literal 'None'
.
df['stage'] = df['doggo'].combine_first(df['floorfer'])\
.combine_first(df['pupper'])\
.combine_first(df['puppo'])
Alternatively, for DRY-er approach, use reduce
:
from functools import reduce
...
df['stage'] = reduce(lambda x,y: x.combine_first(y),
[df['doggo'], df['floorfer'], df['pupper'], df['puppo']])
Upvotes: 5
Reputation: 323396
df['New']=df[['A','B','C']].replace('None','').sum(1)
df
Out[1477]:
A B C New
0 None B None B
1 A None None A
2 None None C C
Data input
df=pd.DataFrame({'A':['None','A','None'],'B':['B','None','None'],'C':['None','None','C']})
Upvotes: 2