PythonFisher
PythonFisher

Reputation: 159

Counting Unique Values in a Column

I have a df that has one column with multiple comma-separated values in each row. I want to count how many times a unique value occurs in that column.

The df looks like this:

                             category  country
0  widget1, widget2, widget3, widget4      USA
1                    widget1, widget3      USA
2                   widget1, widget2     China
3                             widget2   Canada
4           widget1, widget2, widget3    China
5                             widget2  Vietnam
6                             widget3   Canada
7                    widget1, widget3      USA
8                    widget1, widget3    Japan
9                             widget2  Germany 

sample dataframe

I want know how many times each widget appears in the column "category". The results in this example would be:

widget1 = 6, widget2 = 6, widget3 = 6, widget4 = 1

I can use .value_counts

df["category"].value_counts()

but that's only going to return rows that are exactly the same.

enter image description here

I could use value_counts and enter each value for it to count, but in the actual DataFrame there are too many rows and unique values in that column to make it practical.

Also, is there a way to not double count if a single row contains two values that are the same? For example is there was a "widget1, black widget1, yellow widget1" in the same row, I'd just want to count that as one widget1.

Upvotes: 2

Views: 1505

Answers (3)

iamchoosinganame
iamchoosinganame

Reputation: 1120

This might not be the most elegant solution but I think it should work. Basically we need to separate each word in the Category column and then count the words.

from itertools import chain
words=[i.split(',') for i in df['Category'].tolist()]
words=[i.strip() for i in chain.from_iterable(words)]
pd.Series(words).value_counts()

Upvotes: 0

Erfan
Erfan

Reputation: 42886

Another solution would be to unnest your string to rows, then use value_counts:

explode_str(df, 'category', ',').value_counts()

widget2    6
widget1    6
widget3    6
widget4    1
Name: category, dtype: int64

Function used from linked answer:

def explode_str(df, col, sep):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.count(sep) + 1)
    return df.iloc[i].assign(**{col: sep.join(s).split(sep)})

Upvotes: 1

BENY
BENY

Reputation: 323226

Do with get_dummies

df.category.str.get_dummies(',').replace(0,np.nan).stack().sum(level=1)

Upvotes: 4

Related Questions