gh1222
gh1222

Reputation: 397

How to groupby two columns, not considering order of values there?

I have a dataframe:

val1   val2   val3
a       b      10
a       b      2
b       a      3
f       k      5
f       k      2

when i do df.groupby(["val1", "val2"])["val3"].mean().reset_index() I get:

val1   val2   val3
a       b      6
b       a      3
f       k      3.5

but i don't want to take into account order of val1 and val2. so desired result is:

val1   val2   val3
a       b      5
f       k      3.5

How to do that?

Upvotes: 0

Views: 135

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195408

Another solution, with frozenset:

x = (
    df.groupby(df[["val1", "val2"]].apply(frozenset, axis=1))
    .agg({"val1": "first", "val2": "first", "val3": "mean"})
    .reset_index(drop=True)
)
print(x.to_markdown())

Prints:

val1 val2 val3
0 a b 5
1 f k 3.5

Upvotes: 0

d.b
d.b

Reputation: 32548

nm = ["val1", "val2"]
grp = df[nm].apply(lambda x: tuple(sorted(list(x))), axis=1)
s = df.val3.groupby(grp).mean()
s.index = pd.MultiIndex.from_tuples(s.index, name=nm)
s.reset_index()
#   val1 val2  val3
# 0    a    b   5.0
# 1    f    k   3.5

Upvotes: 1

Related Questions