user2110417
user2110417

Reputation:

How to get unique elements and their counts in each category from a dataframe?

I have dataframe as follows:

data = {
    'values': ['abc and sea', 'def work', 'abc', 'ram cold', 'myran add'],
    'gems': ['A1, A2, A3, A4', 'B1, A1, B2, B3', 'C1, A1', 'KS, KM, A1, A3', 'A2, A1, JP, CVK'],
    'Name': ['Drama', 'Drama', 'Drama', 'Ant and Boat', 'Ant and Boat']
}
df = pd.DataFrame(data)

df

    values        gems              Name
0   abc and sea   A1, A2, A3, A4    Drama
1   def work      B1, A1, B2, B3    Drama
2   abc           C1, A1            Drama
3   ram cold      KS, KM, A1, A3    Ant and Boat
4   myran add     A2, A1, JP, CVK   Ant and Boat

I want count unique df['gems'] for each df['Name']. The expected output is:

    Name            gems                             Count
0   Drama           A1, A2, A3, A4, B1, B2, B3, C1   8
1   Ant and Boat    KS, KM, A1, A3, A2, JP, CVK      7

Any help appreciated..

Upvotes: 0

Views: 89

Answers (2)

Steele Farnsworth
Steele Farnsworth

Reputation: 893

new_df = pd.concat(
    (
        df['gems'].str.findall(r'[A-Z]+\d?').explode(),
        df['Name']
    ), axis=1
).drop_duplicates().groupby('Name').agg(pd.Series.tolist)

new_df['Count'] = new_df['gems'].apply(len)

This explodes the gems column of the original DataFrame, maps it to its associated name, removes the duplicate matches, implodes them back into lists, and then counts those lists.

Here's an alternative for the first statement:

new_df = (
    df.assign(gems=df['gems'].str.findall(r'[A-Z]+\d?'))
        .drop('values', axis=1)
        .explode('gems')
        .drop_duplicates()
        .groupby('Name')
        .agg(pd.Series.tolist)
    )

Upvotes: 0

Chris
Chris

Reputation: 16162

You could get the set of the split/exploded values in a groupby, then take the len of that column.

df= df.groupby('Name').agg({'gems':lambda x: set(x.str.split(', ').explode())}).reset_index()
df['count'] = df['gems'].apply(len)

Upvotes: 0

Related Questions