ComplexData
ComplexData

Reputation: 1093

Count unique elements from all the rows of a column

I have a data frame df with a column called "groups". It looks like below -

groups
[u'CN=MyUsers,OU=Groups,DC=sample,DC=com',u'CN=Sample-Users,OU=Groups,DC=sample,DC=com']
[u'CN=MyUsers,OU=Groups,DC=sample,DC=com',u'CN=Sample-Users,OU=Groups,DC=sample,DC=com',u'CN=MoreUsers,OU=Groups,DC=sample,DC=com']

First row contains 2 groups and 2nd row contains 3 groups. I want to make a count of each unique group in the whole column. So my resulting data frame should say -

Group                                         Count
u'CN=MyUsers,OU=Groups,DC=sample,DC=com'       2
u'CN=Sample-Users,OU=Groups,DC=sample,DC=com'  2
u'CN=MoreUsers,OU=Groups,DC=sample,DC=com'     1

How will I be able to achieve this task. I am trying-

res=df.groups.apply(pd.Series).stack().value_counts()

But this doesn't give me expected result. This doesn't break into counts of individual groups.

Upvotes: 0

Views: 379

Answers (2)

gold_cy
gold_cy

Reputation: 14226

This should work:

from itertools import chain

pd.DataFrame(map(lambda x: (x, 1), chain.from_iterable(df.groups.values))).groupby(0, as_index=False).sum().rename(columns={0:'Group', 1:'Count'})

                                        Group  Count
0     CN=MoreUsers,OU=Groups,DC=sample,DC=com      1
1       CN=MyUsers,OU=Groups,DC=sample,DC=com      2
2  CN=Sample-Users,OU=Groups,DC=sample,DC=com      2

Also like this:

pd.DataFrame(list(chain.from_iterable(df.groups.values)), columns=['Group']).Group.value_counts()

CN=Sample-Users,OU=Groups,DC=sample,DC=com    2
CN=MyUsers,OU=Groups,DC=sample,DC=com         2
CN=MoreUsers,OU=Groups,DC=sample,DC=com       1

Time tests:

%timeit pd.DataFrame(list(chain.from_iterable(df.groups.values)), columns=['Group']).Group.value_counts()

1000 loops, best of 3: 899 µs per loop

%timeit pd.DataFrame(list(chain.from_iterable(df.groups.values))).groupby(0, as_index=False).sum().rename(columns={0:'Group', 1:'Count'})

100 loops, best of 3: 5.5 ms per loop

Upvotes: 1

yuhow5566
yuhow5566

Reputation: 555

For each element in your column, you maybe can do something like this:

#if column1 = [u'CN=MyUsers,OU=Groups,DC=sample,DC=com',u'CN=Sample-Users,OU=Groups,DC=sample,DC=com']
#if column2 = [u'CN=MyUsers,OU=Groups,DC=sample,DC=com',u'CN=Sample-Users,OU=Groups,DC=sample,DC=com',u'CN=MoreUsers,OU=Groups,DC=sample,DC=com']

count_dict = dict()
for column in YOUR_DATA_SHEET:
    for element in column:
        if element in count_dict.keys():
            count_dict[element] += 1
        else:
            count_dict[element] = 1

So that you can have a dictionary which will record the element and its counts.

Upvotes: 1

Related Questions