How to get unique elements and their counts in each category from a dataframe?

Question

I have dataframe as follows:

data = {
    'values': ['abc and sea', 'def work', 'abc', 'ram cold', 'myran add'],
    'gems': ['A1, A2, A3, A4', 'B1, A1, B2, B3', 'C1, A1', 'KS, KM, A1, A3', 'A2, A1, JP, CVK'],
    'Name': ['Drama', 'Drama', 'Drama', 'Ant and Boat', 'Ant and Boat']
}
df = pd.DataFrame(data)

df

    values        gems              Name
0   abc and sea   A1, A2, A3, A4    Drama
1   def work      B1, A1, B2, B3    Drama
2   abc           C1, A1            Drama
3   ram cold      KS, KM, A1, A3    Ant and Boat
4   myran add     A2, A1, JP, CVK   Ant and Boat

I want count unique df['gems'] for each df['Name']. The expected output is:

    Name            gems                             Count
0   Drama           A1, A2, A3, A4, B1, B2, B3, C1   8
1   Ant and Boat    KS, KM, A1, A3, A2, JP, CVK      7

Any help appreciated..

Chris · Accepted Answer

You could get the set of the split/exploded values in a groupby, then take the len of that column.

df= df.groupby('Name').agg({'gems':lambda x: set(x.str.split(', ').explode())}).reset_index()
df['count'] = df['gems'].apply(len)

How to get unique elements and their counts in each category from a dataframe?

Answers (2)

Related Questions