Reputation: 43
I have the following df.head():
There are 30 different cost_center_id / cost_center_name.
I already groupby and sum by cost_center_name to check the top 10 cost centers.
complete_clean.groupby(['cost_center_name', 'cost_center_id'])['value'].sum().reset_index().sort_values(by=['value']).tail(10)
Now that I have the cost_center_id of the top 10 I want to create a column where I check if the cost_center_id is in the top 10 or not.
I'm trying:
top_10 = [21, 24, 23, 7, 4, 12, 20, 5, 11, 13]
condition = complete_clean['cost_center_id'] in top_10
complete_clean['cost_center_grouped'] = np.where(condition, complete_clean['cost_center_name'], 'Outros')
and I get:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
How can I solve this? Thanks in advance.
Upvotes: 0
Views: 37
Reputation: 233
That's a great start. Without seeing more of your data and error message I'm not exactly sure where your problem is. It could be with your
condition = complete_clean['cost_center_id'] in top_10
Here is the solution I came up with. I made up a little mini DataFrame to test the solution on.
complete_clean = pd.DataFrame({'cost_center_id':[21,22,23,24,25,7,9],
'cost_center_name':['is_in','not_in','is_in','is_in','not_in','is_in','not_in']})
top_10 = [21, 24, 23, 7, 4, 12, 20, 5, 11, 13]
complete_clean['top_ten'] = complete_clean['cost_center_id'].isin(top_10)
complete_clean['top_ten_name'] = np.where(complete_clean['top_ten'], complete_clean['cost_center_name'], 'Outros')
complete_clean
Upvotes: 2