Reputation: 7994
I have a dataframe that hs 2 columns
Text Categories
"Hi Hello" [F35, B3, C98]
"Where is" [G58, F35, C17]
"Is she?!" [T92, F35, B3]
the field Categories
is an array of Categories
I want to find how many distinct Categories I have
I tried this code but did not work
print(len(sorted(set(df['Categories']))))
I tried this but it was just for one record !
print(len(sorted(set(df['Categories'][0]))))
I did not know how to do it for all categories in the dataframe?
Upvotes: 0
Views: 42
Reputation: 17368
This should give you unique categories.
In [128]: df = pd.DataFrame({
...: 'Text': ["Hi Hello", "Where is","Is she?!"],
...: 'Categories': [["F35", "B3", "C98"],["G58", "F35", "C17"],["G58", "F35", "C17"]]
...: })
In [131]: set(df["Categories"].explode())
Out[131]: {'B3', 'C17', 'C98', 'F35', 'G58'}
Credits to @DanielGeffen - You can also use df["Categories"].explode().unique()
Upvotes: 2