GuiR
GuiR

Reputation: 3

Group by and Filter with Pandas without loosing groupby

I'm a begginer in the subject and didn't find anything to help me here so far. I'm struggling in grouping my data and then filtering with a value I need.

Like the example,

Table

I need to Know, for example, how many Red Cars Juan bought. (Red Cars sells for each client). When I try, I loose the group or the filter, I can't do both.

Can someone help me or suggest a post please?

Edit1.

With the help of the community, I find this as my solution:

df = df.loc[:, df.columns.intersection(['Name', 'Car colour', 'Amount'])]

df = df.query('Car colour == Red')

df.groupby(['Name', 'Car colour'])['Amount'].sum().reset_index()

Upvotes: 0

Views: 2539

Answers (3)

Satya
Satya

Reputation: 5907

If you want to consider amount sold by group of Name and Car_color then try

df.groupby(['Name', 'Car colour'])['Amount'].sum().reset_index()
#    Name   Car colour  Amount
 0    Juan      green       1
 1    Juan        red       3
 2  Wilson       blue       1
 3  carlos     yellow       1

Upvotes: 1

Bill the Lizard
Bill the Lizard

Reputation: 405925

You can group by multiple columns by passing a list of column names to the groupby function, then taking the sum of each group.

import pandas as pd

df = pd.DataFrame({'Name': ['Juan', 'Wilson', 'Carlos', 'Juan', 'Juan', 'Wilson', 'Juan'],
                   'Car Color': ['Red', 'Blue', 'Yellow', 'Red', 'Red', 'Red', 'Green'],
                   'Amount': [1, 1, 1, 1, 1, 1, 1]})
print(df)

agg_df = df.groupby(['Name', 'Car Color']).sum()
print(agg_df)

Output:

Name   Car Color        
Carlos Yellow          1
Juan   Green           1
       Red             3
Wilson Blue            1
       Red             1

Note that the resulting dataframe has a multi-index, so you can get the number of red cars that Juan bought by passing a tuple of values to loc.

cars = agg_df.loc[[('Juan', 'Red')]]
print(cars)

Output:

                Amount
Name Car Color        
Juan Red             3

Upvotes: 1

Piotr Żak
Piotr Żak

Reputation: 2132

GroupBy.sum

df.groupby(['Name','Car Color']).sum()

output:

enter image description here

import pandas as pd

data = {"Name": ["Juan", "Wilson", "Carlos", "Juan", "Juan", "Wilson", "Juan", "Carlos"],
        "Car Color": ["Red", "Blue", "Yellow", "Red", "Red", "Red", "Red", "Green"],
        "Amount": [24, 28, 40, 22, 29, 33, 31, 50]}
df = pd.DataFrame(data)
print(df)

Upvotes: 1

Related Questions