Reputation: 3
I'm a begginer in the subject and didn't find anything to help me here so far. I'm struggling in grouping my data and then filtering with a value I need.
Like the example,
I need to Know, for example, how many Red Cars Juan bought. (Red Cars sells for each client). When I try, I loose the group or the filter, I can't do both.
Can someone help me or suggest a post please?
Edit1.
With the help of the community, I find this as my solution:
df = df.loc[:, df.columns.intersection(['Name', 'Car colour', 'Amount'])]
df = df.query('Car colour == Red')
df.groupby(['Name', 'Car colour'])['Amount'].sum().reset_index()
Upvotes: 0
Views: 2539
Reputation: 5907
If you want to consider amount sold by group of Name and Car_color then try
df.groupby(['Name', 'Car colour'])['Amount'].sum().reset_index()
# Name Car colour Amount
0 Juan green 1
1 Juan red 3
2 Wilson blue 1
3 carlos yellow 1
Upvotes: 1
Reputation: 405925
You can group by multiple columns by passing a list of column names to the groupby
function, then taking the sum of each group.
import pandas as pd
df = pd.DataFrame({'Name': ['Juan', 'Wilson', 'Carlos', 'Juan', 'Juan', 'Wilson', 'Juan'],
'Car Color': ['Red', 'Blue', 'Yellow', 'Red', 'Red', 'Red', 'Green'],
'Amount': [1, 1, 1, 1, 1, 1, 1]})
print(df)
agg_df = df.groupby(['Name', 'Car Color']).sum()
print(agg_df)
Output:
Name Car Color
Carlos Yellow 1
Juan Green 1
Red 3
Wilson Blue 1
Red 1
Note that the resulting dataframe has a multi-index, so you can get the number of red cars that Juan bought by passing a tuple of values to loc
.
cars = agg_df.loc[[('Juan', 'Red')]]
print(cars)
Output:
Amount
Name Car Color
Juan Red 3
Upvotes: 1
Reputation: 2132
df.groupby(['Name','Car Color']).sum()
output:
import pandas as pd
data = {"Name": ["Juan", "Wilson", "Carlos", "Juan", "Juan", "Wilson", "Juan", "Carlos"],
"Car Color": ["Red", "Blue", "Yellow", "Red", "Red", "Red", "Red", "Green"],
"Amount": [24, 28, 40, 22, 29, 33, 31, 50]}
df = pd.DataFrame(data)
print(df)
Upvotes: 1