piRSquared
piRSquared

Reputation: 294228

sort by one column within groups of another without changing positions of grouping column

consider the df

df = pd.DataFrame(dict(A=list('babbaa'), B=list('zxyxzy')))
df

enter image description here

I want to sort B with groups defined by A. But I don't want the positions of A to change.

If I try:

df.groupby('A', sort=False) \
    .apply(pd.DataFrame.sort_values, by='B') \
    .reset_index(drop=True)

enter image description here

You'll notice that A is grouped together. I wanted this:

enter image description here

Upvotes: 2

Views: 83

Answers (2)

Nickil Maveli
Nickil Maveli

Reputation: 29711

For your contrived example:

Sort w.r.t both A and B and let A take on the index. Later, reset the index to make a reference DF.

A = df.sort_values(['A', 'B']).set_index('A').reset_index()

Next, set A as the index along with the normal integer index by using append. Sort the index(which belongs to A). Now reset the index again.

B = df.set_index('A', append=True).sort_index(level=1).reset_index(level=1)

Let A take on B's index. Sort the obtained index axis.

A.index = B.index
A.sort_index()

enter image description here

Upvotes: 0

piRSquared
piRSquared

Reputation: 294228

Here's what I've come up with

df = pd.DataFrame(dict(A=list('babbaa'), B=list('zxyxzy')))

A, B = df.A.values, df.B.values

Use np.unique inverse (index values in all their relative positions).

u, iv = np.unique(A, return_inverse=True)

Use inverse and broadcasting to create a row for every group, where each row is a boolean mask for that group.

is_ = np.arange(len(u))[:, None] == iv

Loop over rows and reassign a position tracking array i with updated values.

i = np.arange(len(df))
for r in is_:
    i[r] = i[r][df.B.values[r].argsort()]

Use new position values

df.iloc[i]

enter image description here


At the moment, I can't figure out how to get rid of that loop.

Upvotes: 1

Related Questions