cebs
cebs

Reputation: 43

Add column based on condition to create top_10

I have the following df.head():

enter image description here

There are 30 different cost_center_id / cost_center_name.

I already groupby and sum by cost_center_name to check the top 10 cost centers.

complete_clean.groupby(['cost_center_name', 'cost_center_id'])['value'].sum().reset_index().sort_values(by=['value']).tail(10)

Now that I have the cost_center_id of the top 10 I want to create a column where I check if the cost_center_id is in the top 10 or not.

I'm trying:

top_10 = [21, 24, 23, 7, 4, 12, 20, 5, 11, 13]

condition = complete_clean['cost_center_id'] in top_10

complete_clean['cost_center_grouped'] = np.where(condition, complete_clean['cost_center_name'], 'Outros')

and I get:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

How can I solve this? Thanks in advance.

Upvotes: 0

Views: 37

Answers (2)

Bradon
Bradon

Reputation: 233

That's a great start. Without seeing more of your data and error message I'm not exactly sure where your problem is. It could be with your

condition = complete_clean['cost_center_id'] in top_10

Here is the solution I came up with. I made up a little mini DataFrame to test the solution on.

complete_clean = pd.DataFrame({'cost_center_id':[21,22,23,24,25,7,9],
                               'cost_center_name':['is_in','not_in','is_in','is_in','not_in','is_in','not_in']})

top_10 = [21, 24, 23, 7, 4, 12, 20, 5, 11, 13]

complete_clean['top_ten'] = complete_clean['cost_center_id'].isin(top_10)

complete_clean['top_ten_name'] = np.where(complete_clean['top_ten'], complete_clean['cost_center_name'], 'Outros')

complete_clean

Upvotes: 2

XXavier
XXavier

Reputation: 1226

Try this please complete_clean['cost_center_id'].isin(top_10)

Upvotes: 1

Related Questions