Reputation: 1093
I have a data frame df with a column called "groups". It looks like below -
groups
[u'CN=MyUsers,OU=Groups,DC=sample,DC=com',u'CN=Sample-Users,OU=Groups,DC=sample,DC=com']
[u'CN=MyUsers,OU=Groups,DC=sample,DC=com',u'CN=Sample-Users,OU=Groups,DC=sample,DC=com',u'CN=MoreUsers,OU=Groups,DC=sample,DC=com']
First row contains 2 groups and 2nd row contains 3 groups. I want to make a count of each unique group in the whole column. So my resulting data frame should say -
Group Count
u'CN=MyUsers,OU=Groups,DC=sample,DC=com' 2
u'CN=Sample-Users,OU=Groups,DC=sample,DC=com' 2
u'CN=MoreUsers,OU=Groups,DC=sample,DC=com' 1
How will I be able to achieve this task. I am trying-
res=df.groups.apply(pd.Series).stack().value_counts()
But this doesn't give me expected result. This doesn't break into counts of individual groups.
Upvotes: 0
Views: 379
Reputation: 14226
This should work:
from itertools import chain
pd.DataFrame(map(lambda x: (x, 1), chain.from_iterable(df.groups.values))).groupby(0, as_index=False).sum().rename(columns={0:'Group', 1:'Count'})
Group Count
0 CN=MoreUsers,OU=Groups,DC=sample,DC=com 1
1 CN=MyUsers,OU=Groups,DC=sample,DC=com 2
2 CN=Sample-Users,OU=Groups,DC=sample,DC=com 2
Also like this:
pd.DataFrame(list(chain.from_iterable(df.groups.values)), columns=['Group']).Group.value_counts()
CN=Sample-Users,OU=Groups,DC=sample,DC=com 2
CN=MyUsers,OU=Groups,DC=sample,DC=com 2
CN=MoreUsers,OU=Groups,DC=sample,DC=com 1
Time tests:
%timeit pd.DataFrame(list(chain.from_iterable(df.groups.values)), columns=['Group']).Group.value_counts()
1000 loops, best of 3: 899 µs per loop
%timeit pd.DataFrame(list(chain.from_iterable(df.groups.values))).groupby(0, as_index=False).sum().rename(columns={0:'Group', 1:'Count'})
100 loops, best of 3: 5.5 ms per loop
Upvotes: 1
Reputation: 555
For each element in your column, you maybe can do something like this:
#if column1 = [u'CN=MyUsers,OU=Groups,DC=sample,DC=com',u'CN=Sample-Users,OU=Groups,DC=sample,DC=com']
#if column2 = [u'CN=MyUsers,OU=Groups,DC=sample,DC=com',u'CN=Sample-Users,OU=Groups,DC=sample,DC=com',u'CN=MoreUsers,OU=Groups,DC=sample,DC=com']
count_dict = dict()
for column in YOUR_DATA_SHEET:
for element in column:
if element in count_dict.keys():
count_dict[element] += 1
else:
count_dict[element] = 1
So that you can have a dictionary which will record the element and its counts.
Upvotes: 1