Reputation: 4498
In the following dataframe df:
Type Description Counts
A blue 34645
A red 45765
B red 36587
C green 42653
I want to implement a category hierarchy, but only for Type A.
I am using this code:
category_hierarchy={
'blue':'in progress',
'red':'review'}
df['Category_Hierachy'] = df['Description'].replace(category_hierarchy)
However, this creates the following:
Type Description Counts Category_Hierachy
A blue 34645 in progress
A red 45765 review
B red 36587 review
C green 42653 green
INSTEAD OF
Type Description Counts Category_Hierachy
A blue 34645 in progress
A red 45765 review
B red 36587
C green 42653
How can I apply my code to only rows with Type A?
Thank You
Upvotes: 1
Views: 1107
Reputation: 1971
def custom_apply(row):
if row['Type'] == 'A':
return category_hierarchy[row['Description']]
return ''
df['Category_Hierachy'] = df.apply(custom_apply, 1)
or
isin
idx = df['Type'].isin(['B', 'C'])
df.ix[idx, "Category_Hierachy"] = ""
Upvotes: 2
Reputation: 131640
Assuming email
is a typo for df
and I properly understand what you're trying to do: the blank spaces in the column you're adding will have to be filled with some value. You can't create a column that doesn't even have entries for certain rows. As long as that's okay, I recommend creating the new column with all values set to the "default" first, and only afterwards assigning to the rows where Type
is 'A'
.
In terms of code, you can create a new column filled with empty strings as
df['Category_Hierarchy'] = ''
(substitute your default value of choice) and then you can reference only the rows of type A by using a boolean index.
index = df['Type'] == 'A'
df.loc[index, 'Category_Hierarchy'] = ...
That last line will assign to only the cells in column Category_Hierarchy
which are in rows where Type
is 'A'
.
Upvotes: 1