Ghislaine Boogerd
Ghislaine Boogerd

Reputation: 37

How to shuffle a dataframe while maintaining the order of a specific column

I have a pandas dataframe which I want to shuffle, but keep the order of 1 column.

So imagine I have the following df:

| i | val | val2| ID |    
| 0 | 2   | 2 |a  |  
| 1 | 3   | 3 |b  |  
| 2 | 4   | 4 |a  |  
| 3 | 6   | 5 |b  |  
| 4 | 5   | 6 |b  |  

I want to shuffle the rows but keep the order of the ID column of the first df. My wanted result would be something like this:

| i | val | val2| ID |  
| 2 | 4   | 4 |a  |    
| 4 | 5   | 6 |b  |  
| 0 | 2   | 2 |a  |  
| 3 | 6   | 5 |b  |  
| 1 | 3   | 3 |b  |  

How do I do this?

Upvotes: 1

Views: 475

Answers (1)

Roy2012
Roy2012

Reputation: 12503

Here's a solution:

df = pd.DataFrame({"val": [1, 2, 3, 4, 5, 6, 7], "ID": ["a", "b", "a", "b", "a", "a", "b"]})
df["val"] = df.groupby("ID").transform(lambda x: x.sample(frac=1))
print(df)

The output is:

   val ID
0    5  a
1    7  b
2    1  a
3    2  b
4    3  a
5    6  a
6    4  b

If you have a dataframe with multiple columns, and you'd like to shuffle while maintaining the order of one of the columns, the solution is very similar:

df = pd.DataFrame({"val": [1, 2, 3, 4, 5, 6, 7], 
                   "val2": range(10, 17), 
                   "ID": ["a", "b", "a", "b", "a", "a", "b"], 
                  })

df[["val", "val2"]] = df.groupby("ID").transform(lambda x: x.sample(frac=1))
print(df)

==>

   val  val2 ID
0    3    12  a
1    7    16  b
2    5    14  a
3    2    11  b
4    6    15  a
5    1    10  a
6    4    13  b

Upvotes: 1

Related Questions