kms
kms

Reputation: 2024

TypeError: unhashable type: 'dict' - pandas groupby

I am attempting to run a pandas groupby on my DataFrame.

import pandas as pd 

# Groupby
cols = ['col1','col2']

df = df.groupby(cols, as_index=False).agg({'r': lambda x: list(x)})

Some columns may have dicts as values. I attempt to find them and convert to string like this:

# Find columns containing dicts
col_dict = [i for i in df.columns if isinstance(dfd[i][0],dict)]
col_dict

# Dict to string
df[col_dict] = df[col_dict].astype(str)

How do I groupby a dataframe with columns containing dicts data type?

Upvotes: 0

Views: 2656

Answers (1)

ogdenkev
ogdenkev

Reputation: 2374

Given the title and the actual question you asked, I assume you are doing something like the following.

import pandas as pd

df = pd.DataFrame(
    {
        "col1": [{"k": 1, "k2": 2}, {"k": 3, "k2": 4}, {"k": 1, "k2": 2}, {"k": 3, "k2": 4}],
        "col2": [42, 88, 86, 75],
        "r": [3.1415, 2.71, 1.121, 0.333],
        "cat": ["a", "a", "b", "b"]
    }
)

df = df.groupby(by=["col1"], as_index=False).agg({'r': lambda x: list(x)})

which for me gives the error you mention, TypeError: unhashable type: 'dict'.

One way you could do this, as you've started to already, is to convert the dictionaries to strings. Here is how this could be done.

df = df.assign(
    group=lambda DF: DF.col1.astype(str)
).groupby(
    by=["group"],
    as_index=False
).agg(
    {'r': lambda x: list(x)}
)
print(df)

which produces

               group                r
0  {'k': 1, 'k2': 2}  [3.1415, 1.121]
1  {'k': 3, 'k2': 4}    [2.71, 0.333]

As to why grouping by the dictionaries directly raises the error, it seems that under the hood Pandas is using a hash table to create the group keys. Therefore, the values that are used for the group, must be hashable, which dictionaries are not. Converting them to strings would make them hashable. Another way would be to convert to a frozenset of (key, value) pairs, which matches the fact that dictionaries are equal if they have the same set of (key, value) pairs, regardless of order (see the docs).

This actually raises an interesting "gotcha". If your dictionaries happen to have the same (key, value) pairs, but in different orders, then converting them to strings may not work as expected. For example,

df = pd.DataFrame(
    {
        "col1": [{"k": 1, "k2": 2}, {"k": 3, "k2": 4}, {"k2": 2, "k": 1}, {"k": 3, "k2": 4}],
        "col2": [42, 88, 86, 75],
        "r": [3.1415, 2.71, 1.121, 0.333],
        "cat": ["a", "a", "b", "b"]
    }
)

df = df.assign(
    group=lambda DF: DF.col1.astype(str)
).groupby(
    by=["group"],
    as_index=False
).agg(
    {'r': lambda x: list(x)}
)

print(df)

produces

                group              r
0   {'k': 1, 'k2': 2}       [3.1415]
1   {'k': 3, 'k2': 4}  [2.71, 0.333]
2  {'k2': 2, 'k': 1}        [1.121]

Notice here that the dictionaries {'k': 1, 'k2': 2} and {'k2': 2, 'k1'} have the same (key, value) pairs but are considered different groups, even though

d1 = {"k": 1, "k2": 2}
d2 = {"k2": 2, "k": 1}
print(d1 == d2)

prints True

A more robust way to group by dictionaries that would create groups when the dictionaries are equal would be

df = df.assign(
    group=lambda DF: DF.col1.apply(lambda x: frozenset(x.items()))
).groupby(
    by=["group"],
    as_index=False
).agg(
    {'r': lambda x: list(x)}
)

Upvotes: 3

Related Questions