Reputation: 37
I have a pandas dataframe which I want to shuffle, but keep the order of 1 column.
So imagine I have the following df:
| i | val | val2| ID |
| 0 | 2 | 2 |a |
| 1 | 3 | 3 |b |
| 2 | 4 | 4 |a |
| 3 | 6 | 5 |b |
| 4 | 5 | 6 |b |
I want to shuffle the rows but keep the order of the ID column of the first df. My wanted result would be something like this:
| i | val | val2| ID |
| 2 | 4 | 4 |a |
| 4 | 5 | 6 |b |
| 0 | 2 | 2 |a |
| 3 | 6 | 5 |b |
| 1 | 3 | 3 |b |
How do I do this?
Upvotes: 1
Views: 475
Reputation: 12503
Here's a solution:
df = pd.DataFrame({"val": [1, 2, 3, 4, 5, 6, 7], "ID": ["a", "b", "a", "b", "a", "a", "b"]})
df["val"] = df.groupby("ID").transform(lambda x: x.sample(frac=1))
print(df)
The output is:
val ID
0 5 a
1 7 b
2 1 a
3 2 b
4 3 a
5 6 a
6 4 b
If you have a dataframe with multiple columns, and you'd like to shuffle while maintaining the order of one of the columns, the solution is very similar:
df = pd.DataFrame({"val": [1, 2, 3, 4, 5, 6, 7],
"val2": range(10, 17),
"ID": ["a", "b", "a", "b", "a", "a", "b"],
})
df[["val", "val2"]] = df.groupby("ID").transform(lambda x: x.sample(frac=1))
print(df)
==>
val val2 ID
0 3 12 a
1 7 16 b
2 5 14 a
3 2 11 b
4 6 15 a
5 1 10 a
6 4 13 b
Upvotes: 1