error: unhashable type: 'list'. While using df.groupby.apply

Question

Here's my dataframe:

I want to sort my dataframe by airline and then within this group by tweet_created. airline and tweet_created are two columns in my dataframe. I tried te following

df.groupby(['airline']).apply(lambda x: x.sort_values(['tweet_created'])).reset_index(drop = True)

But got this error:

unhashable type: 'list'

I don't understand what's going wrong here. Can someone help me?

jpp · Accepted Answer

From your sample dataframe, it appears your airline series consists of list objects. Since list is mutable and not hashable, it can't be used for grouping operations. Internally, GroupBy relies on hashing.

Assuming each list within your airline series consists of only one element, you can transform your data before grouping. One way is via itertools.chain.

from itertools import chain

df = pd.DataFrame({'airline': [['VirginAmerica'], ['united'], ['USAirways']]})

df['airline'] = list(chain.from_iterable(df['airline']))

print(df)

         airline
0  VirginAmerica
1         united
2      USAirways

Some performance benchmarking of alternative methods:

# pandas v0.19.2, python 3.6.0

df = pd.concat([df]*1000, ignore_index=True)

%timeit list(chain.from_iterable(df['airline']))  # 228 µs per loop
%timeit np.concatenate(df['airline'])             # 84.9 ms per loop
%timeit df['airline'].apply(pd.Series)            # 817 ms per loop

error: unhashable type: 'list'. While using df.groupby.apply

Answers (1)

Related Questions

error: unhashable type: &#39;list&#39;. While using df.groupby.apply

Answers (1)

Related Questions

error: unhashable type: 'list'. While using df.groupby.apply