CoMartel
CoMartel

Reputation: 3591

How can I get basic statistics about a series of list?

I have a pandas DataFrame and I would like to get the basic stats about it like the number of unique values, number of occurrence for each values. Something like df.describe.

My issue is that some columns have lists, and I get this error :

>>> df["col_a"].nunique()
TypeError: unhashable type: 'list'

my column looks like this:

col_a:
["a","b"]
["b","a"]
["c"]
["a","b","c"]
[]
NaN

What is the simplest way to handle this issue?

Upvotes: 1

Views: 1511

Answers (1)

IanS
IanS

Reputation: 16251

Transform to tuples, which are hashable:

df['col_a'] = df['col_a'].dropna().apply(tuple)

Output:

       col_a
0     (a, b)
1     (b, a)
2       (c,)
3  (a, b, c)
4         ()
5        NaN

You can now do this (returns 5):

df['col_a'].nunique()

Upvotes: 3

Related Questions