Reputation: 73
Here's my dataframe:
I want to sort my dataframe by airline
and then within this group by tweet_created
. airline
and tweet_created
are two columns in my dataframe. I tried te following
df.groupby(['airline']).apply(lambda x: x.sort_values(['tweet_created'])).reset_index(drop = True)
But got this error:
unhashable type: 'list'
I don't understand what's going wrong here. Can someone help me?
Upvotes: 4
Views: 3857
Reputation: 164623
From your sample dataframe, it appears your airline
series consists of list
objects. Since list
is mutable and not hashable, it can't be used for grouping operations. Internally, GroupBy
relies on hashing.
Assuming each list within your airline
series consists of only one element, you can transform your data before grouping. One way is via itertools.chain
.
from itertools import chain
df = pd.DataFrame({'airline': [['VirginAmerica'], ['united'], ['USAirways']]})
df['airline'] = list(chain.from_iterable(df['airline']))
print(df)
airline
0 VirginAmerica
1 united
2 USAirways
Some performance benchmarking of alternative methods:
# pandas v0.19.2, python 3.6.0
df = pd.concat([df]*1000, ignore_index=True)
%timeit list(chain.from_iterable(df['airline'])) # 228 µs per loop
%timeit np.concatenate(df['airline']) # 84.9 ms per loop
%timeit df['airline'].apply(pd.Series) # 817 ms per loop
Upvotes: 1