Reputation: 152
I have two columns of a dataframe for which one will be used to group our data and the other I want to get value counts of for each group.
One of the columns, 'Assigned', contains various strings that are repeated, this column will be used to group the data.
The other column, 'Acquired', consists of either 0 or 1 and I want to count how many 0s and 1s there are for each group.
I would like to store the count for each group in two dictionaries, one for 0s and the other for 1s.
My dataframe looks like this:
df
Assigned Acquired
foo 1
bar 1
baz 0
foo 1
foo 0
... baz 0 ...
bar 1
foo 1
bar 0
baz 0
baz 0
This is what I have tried:
df_acq = df.groupby('Assigned')
df_acq.value_counts('Acquired')
The output of the above code is:
Assigned Acquired
foo 0 1
1 3
bar 0 1
1 2
baz 0 4
1 0
Now, I want to be able to take this series object and convert it to two dictionaries. This would ideally look like:
Acquired_0 = {
'foo': 1,
'bar': 1,
'baz': 4
}
Acquired_1 = {
'foo': 3,
'bar': 2,
'baz': 0,
}
I thought maybe using .to_dict() would work, but this creates two keys for each 'Assigned' value. Example: ('foo', 0): 1 and ('foo', 1): 3 This causes issues as I am eventually going to being adding these dictionaries to node attributes in networkx, so the keys must strictly be the 'Assigned' value.
Upvotes: 1
Views: 812
Reputation: 136
This is how I create a dummy DataFrame:
data = [['foo',1],['bar',1],['baz',0],['foo',1], ['foo',0], ['baz',1],['foo',0]]
df = pd.DataFrame(data, columns=['Assigned', 'Acquired'])
df.head(10)
The DataFrame looks like this:
Now for counting 1
s we can do this:
df_acq = df.groupby('Assigned').sum()
acq_dict_1 = df_acq.to_dict()['Acquired']
print(acq_dict_1)
The output looks like:
{'bar': 1, 'baz': 1, 'foo': 2}
For 0
s we can do this:
df_acq = df.groupby('Assigned').count() - df.groupby('Assigned').sum()
acq_dict_0 = df_acq.to_dict()['Acquired']
print(acq_dict_0)
The output looks like this:
{'bar': 0, 'baz': 1, 'foo': 2}
Upvotes: 0
Reputation: 120
Best to split to 2 separate dataframes:
df_0 = df[df.Acquired==0]
df_1 = df[df.Acquired==1]
And group them:
Acquired_0 = df_0.groupby('Assigned').count().to_dict()['Acquired']
Acquired_1 = df_1.groupby('Assigned').count().to_dict()['Acquired']
Upvotes: 0
Reputation: 59579
Perhaps settle for a dict
of dicts
instead of an arbitrary number of variables. Keys are the unique Aqcuired
value:
import pandas as pd
d = pd.crosstab(df.Acquired, df.Assigned).to_dict(orient='index')
#{0: {'bar': 1, 'baz': 4, 'foo': 1}, 1: {'bar': 2, 'baz': 0, 'foo': 3}}
# If you know there are only 2:
Acquired_0, Acquired_1 = pd.crosstab(df.Acquired, df.Assigned).to_dict(orient='index').values()
Upvotes: 1