Reputation: 294228
consider the df
df = pd.DataFrame(dict(A=list('babbaa'), B=list('zxyxzy')))
df
I want to sort B
with groups defined by A
. But I don't want the positions of A
to change.
If I try:
df.groupby('A', sort=False) \
.apply(pd.DataFrame.sort_values, by='B') \
.reset_index(drop=True)
You'll notice that A
is grouped together. I wanted this:
Upvotes: 2
Views: 83
Reputation: 29711
For your contrived example:
Sort w.r.t both A and B and let A take on the index. Later, reset the index to make a reference DF
.
A = df.sort_values(['A', 'B']).set_index('A').reset_index()
Next, set A as the index along with the normal integer index by using append
. Sort the index(which belongs to A). Now reset the index again.
B = df.set_index('A', append=True).sort_index(level=1).reset_index(level=1)
Let A take on B's index. Sort the obtained index axis.
A.index = B.index
A.sort_index()
Upvotes: 0
Reputation: 294228
Here's what I've come up with
df = pd.DataFrame(dict(A=list('babbaa'), B=list('zxyxzy')))
A, B = df.A.values, df.B.values
Use np.unique
inverse (index values in all their relative positions).
u, iv = np.unique(A, return_inverse=True)
Use inverse and broadcasting to create a row for every group, where each row is a boolean mask for that group.
is_ = np.arange(len(u))[:, None] == iv
Loop over rows and reassign a position tracking array i
with updated values.
i = np.arange(len(df))
for r in is_:
i[r] = i[r][df.B.values[r].argsort()]
Use new position values
df.iloc[i]
At the moment, I can't figure out how to get rid of that loop.
Upvotes: 1