cecilj
cecilj

Reputation: 152

Getting Value Counts of A Column Based on a Groupby

I have two columns of a dataframe for which one will be used to group our data and the other I want to get value counts of for each group.

One of the columns, 'Assigned', contains various strings that are repeated, this column will be used to group the data.

The other column, 'Acquired', consists of either 0 or 1 and I want to count how many 0s and 1s there are for each group.

I would like to store the count for each group in two dictionaries, one for 0s and the other for 1s.

My dataframe looks like this:

df
    Assigned    Acquired
    foo         1
    bar         1
    baz         0
    foo         1
    foo         0
... baz         0         ...
    bar         1
    foo         1
    bar         0
    baz         0
    baz         0

This is what I have tried:

df_acq = df.groupby('Assigned')
df_acq.value_counts('Acquired')

The output of the above code is:

Assigned    Acquired
foo            0       1 
               1       3
bar            0       1
               1       2 
baz            0       4
               1       0

Now, I want to be able to take this series object and convert it to two dictionaries. This would ideally look like:

Acquired_0 = {
    'foo': 1,
    'bar': 1,
    'baz': 4
             }


Acquired_1 = {
    'foo': 3,
    'bar': 2,
    'baz': 0,
             }

I thought maybe using .to_dict() would work, but this creates two keys for each 'Assigned' value. Example: ('foo', 0): 1 and ('foo', 1): 3 This causes issues as I am eventually going to being adding these dictionaries to node attributes in networkx, so the keys must strictly be the 'Assigned' value.

Upvotes: 1

Views: 812

Answers (3)

shariful
shariful

Reputation: 136

This is how I create a dummy DataFrame:

data = [['foo',1],['bar',1],['baz',0],['foo',1], ['foo',0], ['baz',1],['foo',0]]
df = pd.DataFrame(data, columns=['Assigned', 'Acquired'])

df.head(10)

The DataFrame looks like this: enter image description here

Now for counting 1s we can do this:

df_acq = df.groupby('Assigned').sum()
acq_dict_1 = df_acq.to_dict()['Acquired']
print(acq_dict_1)

The output looks like:

{'bar': 1, 'baz': 1, 'foo': 2}

For 0s we can do this:

df_acq = df.groupby('Assigned').count() - df.groupby('Assigned').sum()
acq_dict_0 = df_acq.to_dict()['Acquired']
print(acq_dict_0)

The output looks like this:

{'bar': 0, 'baz': 1, 'foo': 2}

Upvotes: 0

WhiteHat
WhiteHat

Reputation: 120

Best to split to 2 separate dataframes:

 df_0 = df[df.Acquired==0]
 df_1 = df[df.Acquired==1]

And group them:

Acquired_0 = df_0.groupby('Assigned').count().to_dict()['Acquired']
Acquired_1 = df_1.groupby('Assigned').count().to_dict()['Acquired']

Upvotes: 0

ALollz
ALollz

Reputation: 59579

Perhaps settle for a dict of dicts instead of an arbitrary number of variables. Keys are the unique Aqcuired value:

import pandas as pd

d = pd.crosstab(df.Acquired, df.Assigned).to_dict(orient='index')
#{0: {'bar': 1, 'baz': 4, 'foo': 1}, 1: {'bar': 2, 'baz': 0, 'foo': 3}}

# If you know there are only 2:
Acquired_0, Acquired_1 = pd.crosstab(df.Acquired, df.Assigned).to_dict(orient='index').values()

Upvotes: 1

Related Questions