Shuffle rows of dataframe based on a condition

Question

I have a dataframe that contains an ID column, and I would like to shuffle rows that only have a certain ID.

An example of my dataframe is:

-------------------------------
   ID   |   Fruit   |   Color
-------------------------------
 1         apple       green
 2         orange      orange
 1         pear        green
 2         grapefruit  yellow
 1         banana      yellow
 2         tomato      red
 1         grape       black
 2         melon       yellow

Rather than shuffling the entire dataframe, which I have so far gotten with df.sample(frac=1), I am trying to work out how to only shuffle those where ID=1. I have tried the below, which produced a syntax error.

df.apply(lambda x: df.sample(frac=1) if x['ID'] == 1)

jezrael · Accepted Answer

Idea is filter rows by mask with boolean indexing, get sample and assign back with convert values to numpy array for prevent index alignment:

m = df['ID'] == 1

df[m] = df[m].sample(frac=1).to_numpy()
#oldier pandas versions
#df[m] = df[m].sample(frac=1).values
print (df)
   ID       Fruit   Color
0   1        pear   green
1   2      orange  orange
2   1       grape   black
3   2  grapefruit  yellow
4   1       apple   green
5   2      tomato     red
6   1      banana  yellow
7   2       melon  yellow

Shuffle rows of dataframe based on a condition

Answers (1)

Related Questions