Dawn.Sahil
Dawn.Sahil

Reputation: 105

Counting number of values in a column using groupby on a specific conditon in pandas

I have a dataframe which looks something like this:

dfA
name    field    country    action
Sam     elec     USA        POS
Sam     elec     USA        POS
Sam     elec     USA        NEG
Tommy   mech     Canada     NEG
Tommy   mech     Canada     NEG
Brian   IT       Spain      NEG
Brian   IT       Spain      NEG
Brian   IT       Spain      POS

I want to group the dataframe based on the first 3 columns adding a new column "No of data". This is something which I do using this:

dfB = dfA.groupby(["name", "field", "country"], dropna=False).size().reset_index(name = "No_of_data")

This gives me a new dataframe which looks something like this:

dfB
name    field    country   No_of_data
Sam     elec     USA           3
Tommy   mech     Canada        2
Brian   IT       Spain         3

But now I also want to add a new column to this particular dataframe which tells me what is the count of number of "POS" for every combination of "name", "field" and "country". Which should look something like this:

dfB
name    field    country   No_of_data   No_of_POS
Sam     elec     USA           3            2
Tommy   mech     Canada        2            0
Brian   IT       Spain         3            1

How do I add the new column (No_of_POS) to the table dfB when I dont have the information about "POS NEG" in it and needs to be taken from dfA.

Upvotes: 1

Views: 451

Answers (3)

sammywemmy
sammywemmy

Reputation: 28729

You can precompute the boolean before aggregating; performance should be better as the data size increases :

(df.assign(action = df.action.eq('POS'))
   .groupby(['name', 'field', 'country'], 
            sort = False, 
            as_index = False)
   .agg(no_of_data = ('action', 'size'), 
        no_of_pos = ('action', 'sum'))

    name field country  no_of_data  no_of_pos
0    Sam  elec     USA           3          2
1  Tommy  mech  Canada           2          0
2  Brian    IT   Spain           3          1

Upvotes: 2

Mykola Zotko
Mykola Zotko

Reputation: 17911

You can use a dictionary with functions in the aggregate method:

dfA.groupby(["name", "field", "country"], as_index=False)['action']\
.agg({'No_of_data': 'size', 'No_of_POS': lambda x: x.eq('POS').sum()})

Upvotes: 2

Vladislava Gonchar
Vladislava Gonchar

Reputation: 109

You can add an aggregation function when you're grouping your data. Check agg() function, maybe this will help.

Upvotes: 0

Related Questions