Reputation: 4498
Working with the dataframe df in python pandas:
Product_ID | Category | Sub_Cat
32432 0 Gadget
24085 Big Tech Computer
54398 Small Tech Gadget
97456 0 Computer
I am working on a new column, where I will over-write the Sub_Cat value with the Category value, if it is not 0.
This is the output I am looking for:
Product_ID | Category | Sub_Cat | Cat_for_Analysis
32432 0 Gadget Gadget
24085 Big Tech Computer Big Tech
54398 Small Tech Gadget Small Tech
97456 0 Computer Computer
Thank You!
Upvotes: 0
Views: 151
Reputation: 816
You could use apply for this too.
df["Cat_for_Analysis"] = df.apply(lambda row: row["Category"] if row["Category"] != 0 else row["Sub_Cat"], axis=1)
Upvotes: 1
Reputation: 33793
Using np.where
:
df['Cat_for_Analysis'] = np.where(df['Category'] == '0', df['Sub_Cat'], df['Category'])
Or equivalently the negated version, if it makes more intuitive sense based on your problem:
df['Cat_for_Analysis'] = np.where(df['Category'] != '0', df['Category'], df['Sub_Cat'])
The resulting output for either method:
Product_ID Category Sub_Cat Cat_for_Analysis
0 32432 0 Gadget Gadget
1 24085 Big Tech Computer Big Tech
2 54398 Small Tech Gadget Small Tech
3 97456 0 Computer Computer
Upvotes: 1
Reputation: 323246
You can using ffill
after replace
'0' to np.nan
df['Cat_for_Analysis']=df.replace('0',np.nan)[['Category','Sub_Cat']].bfill(1).iloc[:,0]
df
Out[876]:
Product_ID Category Sub_Cat Cat_for_Analysis
0 32432 0 Gadget Gadget
1 24085 BigTech Computer BigTech
2 54398 SmallTech Gadget SmallTech
3 97456 0 Computer Computer
Upvotes: 1