Reputation:

How to group over lists in pandas dataframe

I have a dataframe which looks like this:

df = pd.DataFrame({'col1': [['a','b','c'], ['a','d'], ['c','c']]})

And I want to group the dataframe so it will look like this:

result = pd.DataFrame({'col1': [['a'], ['b'], ['c'], ['d']], 'count': [[2],[1],[3],[4]]})

If I use the pd.groupby('col1').count() option in python I get the error

"Unhashable type: 'list'.

How to solve this?

Upvotes: 1

Answers (1)

jezrael

Reputation: 862671

You need flatten lists by DataFrame constructor, create Series by stack and last value_counts:

df1 = pd.DataFrame(df['col1'].values.tolist()).stack().value_counts().reset_index()
df1.columns = ['col1','count']
df1 = df1.sort_values('col1')
print (df1)
  col1  count
1    a      2
2    b      1
0    c      3
3    d      1

And if really want lists (some pandas function can failed) add applymap:

df1 = df1.applymap(lambda x: [x])
print (df1)
  col1 count
1  [a]   [2]
2  [b]   [1]
0  [c]   [3]
3  [d]   [1]

Another solution with Counter + numpy.concatenate:

from collections import Counter

df1 = pd.Series(Counter(np.concatenate(df['col1']))).reset_index()
df1.columns = ['col1','count']
df1 = df1.applymap(lambda x: [x])
print (df1)
  col1 count
0  [a]   [2]
1  [b]   [1]
2  [c]   [3]
3  [d]   [1]

Upvotes: 2

How to group over lists in pandas dataframe

Answers (1)

Related Questions