Justin
Justin

Reputation: 73

error: unhashable type: 'list'. While using df.groupby.apply

Here's my dataframe:

enter image description here

I want to sort my dataframe by airline and then within this group by tweet_created. airline and tweet_created are two columns in my dataframe. I tried te following

df.groupby(['airline']).apply(lambda x: x.sort_values(['tweet_created'])).reset_index(drop = True)

But got this error:

unhashable type: 'list'

I don't understand what's going wrong here. Can someone help me?

Upvotes: 4

Views: 3857

Answers (1)

jpp
jpp

Reputation: 164623

From your sample dataframe, it appears your airline series consists of list objects. Since list is mutable and not hashable, it can't be used for grouping operations. Internally, GroupBy relies on hashing.

Assuming each list within your airline series consists of only one element, you can transform your data before grouping. One way is via itertools.chain.

from itertools import chain

df = pd.DataFrame({'airline': [['VirginAmerica'], ['united'], ['USAirways']]})

df['airline'] = list(chain.from_iterable(df['airline']))

print(df)

         airline
0  VirginAmerica
1         united
2      USAirways

Some performance benchmarking of alternative methods:

# pandas v0.19.2, python 3.6.0

df = pd.concat([df]*1000, ignore_index=True)

%timeit list(chain.from_iterable(df['airline']))  # 228 µs per loop
%timeit np.concatenate(df['airline'])             # 84.9 ms per loop
%timeit df['airline'].apply(pd.Series)            # 817 ms per loop

Upvotes: 1

Related Questions