Reputation: 582
I have the following dataframe:
ID Col1 Col2
1 "A" "Z"
1 "A" "Y"
1 "B" "Z"
2 "A" "X"
2 "C" "P"
I want to convert the above in the list of dict form as below with the counts by the ID columns:
[{"A" : 2, "B" : 1, "Z" : 2, "Y" : 1}, {"A" : 1, "C" : 1, "X" : 1, "P" : 1}]
Is there anyway I can achieve that. The dataframe I am having is quite big.
Upvotes: 1
Views: 606
Reputation: 109526
Assume your dataframe is named df
. You can get the row numbers for each ID using df.groupby('ID').groups
:
group_rows = df.groupby('ID').groups
We'll iterate through each group ID, and then use Counter
to count the values in Col1
and Col2
. I'll then add these to a dictionary.
from collections import Counter
my_dict = {}
for group_id, rows in group_rows.iteritems():
c = Counter(df.iloc[rows, 1]) # 1 = index number for `Col1`
c.update(df.iloc[rows, 2]) # 2 = index number for `Col2`
my_dict[group_id] = dict(c)
>>> my_dict
{1: {'A': 2, 'B': 1, 'Y': 1, 'Z': 2},
2: {'A': 1, 'C': 1, 'P': 1, 'X': 1}}
I chose to output the results to a dictionary instead of your requested list so that the relationship between the group ID
and the counted values is explicit. If this is an issue, I can convert them to a list by assuming the dataframe is first sorted by the ID
column.
keys = my_dict.keys()
keys.sort()
my_list = [my_dict[k] for k in keys]
>>> my_list
[{'A': 2, 'B': 1, 'Y': 1, 'Z': 2}, {'A': 1, 'C': 1, 'P': 1, 'X': 1}]
Upvotes: 1