Reputation: 6682
Say I have a Pandas
DataFrame
whose data look like
import numpy as np
import pandas as pd
n = 30
df = pd.DataFrame({'a': np.arange(n),
'b': np.random.choice([0, 1, 2], n),
'c': np.arange(n)})
Question: how to permute groups (grouped by b
column)?
Not permutation within each group, but permutation in group level?
Example
Before
a b c
1 0 1
2 0 2
3 1 3
4 1 4
5 2 5
6 2 6
After
a b c
3 1 3
4 1 4
1 0 1
2 0 2
5 2 5
6 2 6
Basically before permutation, df['b'].unqiue() == [0, 1, 2]
, after permutation, df['b'].unique() == [1, 0, 2]
.
Upvotes: 1
Views: 402
Reputation: 10843
Here's an answer inspired by the accepted answer to this SO post, which uses a temporary Categorical
column as a sorting key to do custom sort orderings. In this answer, I produce all permutations, but you can just take the first one if you are looking for only one.
import itertools
df_results = list()
orderings = itertools.permutations(df["b"].unique())
for ordering in orderings:
df_2 = df.copy()
df_2["b_key"] = pd.Categorical(df_2["b"], [i for i in ordering])
df_2.sort_values("b_key", inplace=True)
df_2.drop(["b_key"], axis=1, inplace=True)
df_results.append(df_2)
for df in df_results:
print(df)
The idea here is that we create a new categorical variable each time, with a slightly different enumerated order, then sort by it. We discard it at the end once we no longer need it.
Upvotes: 1
Reputation: 210882
If i understood your question correctly, you can do it this way:
n = 30
df = pd.DataFrame({'a': np.arange(n),
'b': np.random.choice([0, 1, 2], n),
'c': np.arange(n)})
order = pd.Series([1,0,2])
cols = df.columns
df['idx'] = df.b.map(order)
index = df.index
df = df.reset_index().sort_values(['idx', 'index'])[cols]
Step by step:
In [103]: df['idx'] = df.b.map(order)
In [104]: df
Out[104]:
a b c idx
0 0 2 0 2
1 1 0 1 1
2 2 1 2 0
3 3 0 3 1
4 4 1 4 0
5 5 1 5 0
6 6 1 6 0
7 7 2 7 2
8 8 0 8 1
9 9 1 9 0
10 10 0 10 1
11 11 1 11 0
12 12 0 12 1
13 13 2 13 2
14 14 0 14 1
15 15 2 15 2
16 16 1 16 0
17 17 2 17 2
18 18 1 18 0
19 19 1 19 0
20 20 0 20 1
21 21 0 21 1
22 22 1 22 0
23 23 1 23 0
24 24 2 24 2
25 25 0 25 1
26 26 0 26 1
27 27 0 27 1
28 28 1 28 0
29 29 1 29 0
In [105]: df.reset_index().sort_values(['idx', 'index'])
Out[105]:
index a b c idx
2 2 2 1 2 0
4 4 4 1 4 0
5 5 5 1 5 0
6 6 6 1 6 0
9 9 9 1 9 0
11 11 11 1 11 0
16 16 16 1 16 0
18 18 18 1 18 0
19 19 19 1 19 0
22 22 22 1 22 0
23 23 23 1 23 0
28 28 28 1 28 0
29 29 29 1 29 0
1 1 1 0 1 1
3 3 3 0 3 1
8 8 8 0 8 1
10 10 10 0 10 1
12 12 12 0 12 1
14 14 14 0 14 1
20 20 20 0 20 1
21 21 21 0 21 1
25 25 25 0 25 1
26 26 26 0 26 1
27 27 27 0 27 1
0 0 0 2 0 2
7 7 7 2 7 2
13 13 13 2 13 2
15 15 15 2 15 2
17 17 17 2 17 2
24 24 24 2 24 2
Upvotes: 1