asmgx
asmgx

Reputation: 7994

Count all items in the list in dataframe (Python)

I have a dataframe that hs 2 columns

Text           Categories
"Hi Hello"     [F35, B3, C98]
"Where is"     [G58, F35, C17]
"Is she?!"     [T92, F35, B3]

the field Categories is an array of Categories

I want to find how many distinct Categories I have

I tried this code but did not work

print(len(sorted(set(df['Categories']))))

I tried this but it was just for one record !

print(len(sorted(set(df['Categories'][0]))))

I did not know how to do it for all categories in the dataframe?

Upvotes: 0

Views: 42

Answers (1)

bigbounty
bigbounty

Reputation: 17368

This should give you unique categories.

In [128]: df = pd.DataFrame({
     ...:     'Text': ["Hi Hello", "Where is","Is she?!"],
     ...:     'Categories': [["F35", "B3", "C98"],["G58", "F35", "C17"],["G58", "F35", "C17"]]
     ...: })
In [131]: set(df["Categories"].explode())
Out[131]: {'B3', 'C17', 'C98', 'F35', 'G58'}

Credits to @DanielGeffen - You can also use df["Categories"].explode().unique()

Upvotes: 2

Related Questions