jeangelj
jeangelj

Reputation: 4498

Python pandas implement conditional categories

In the following dataframe df:

Type  Description Counts
A        blue      34645
A        red       45765
B        red       36587
C        green     42653

I want to implement a category hierarchy, but only for Type A.

I am using this code:

category_hierarchy={
'blue':'in progress',
'red':'review'}

df['Category_Hierachy'] = df['Description'].replace(category_hierarchy)

However, this creates the following:

Type  Description Counts  Category_Hierachy
A        blue      34645    in progress
A        red       45765    review
B        red       36587    review
C       green      42653    green

INSTEAD OF

Type  Description Counts  Category_Hierachy
A        blue      34645    in progress
A        red       45765    review
B        red       36587    
C       green      42653    

How can I apply my code to only rows with Type A?

Thank You

Upvotes: 1

Views: 1107

Answers (2)

Mo...
Mo...

Reputation: 1971

Using apply

def custom_apply(row):

    if row['Type'] == 'A':

        return category_hierarchy[row['Description']]

    return ''

df['Category_Hierachy'] = df.apply(custom_apply, 1)

or

Using isin

idx = df['Type'].isin(['B', 'C'])
df.ix[idx, "Category_Hierachy"] = ""

Upvotes: 2

David Z
David Z

Reputation: 131640

Assuming email is a typo for df and I properly understand what you're trying to do: the blank spaces in the column you're adding will have to be filled with some value. You can't create a column that doesn't even have entries for certain rows. As long as that's okay, I recommend creating the new column with all values set to the "default" first, and only afterwards assigning to the rows where Type is 'A'.

In terms of code, you can create a new column filled with empty strings as

df['Category_Hierarchy'] = ''

(substitute your default value of choice) and then you can reference only the rows of type A by using a boolean index.

index = df['Type'] == 'A'
df.loc[index, 'Category_Hierarchy'] = ...

That last line will assign to only the cells in column Category_Hierarchy which are in rows where Type is 'A'.

Upvotes: 1

Related Questions