Reputation: 895
I have a dataframe that contains an ID column, and I would like to shuffle rows that only have a certain ID.
An example of my dataframe is:
-------------------------------
ID | Fruit | Color
-------------------------------
1 apple green
2 orange orange
1 pear green
2 grapefruit yellow
1 banana yellow
2 tomato red
1 grape black
2 melon yellow
Rather than shuffling the entire dataframe, which I have so far gotten with df.sample(frac=1)
, I am trying to work out how to only shuffle those where ID=1. I have tried the below, which produced a syntax error.
df.apply(lambda x: df.sample(frac=1) if x['ID'] == 1)
Upvotes: 1
Views: 561
Reputation: 862741
Idea is filter rows by mask with boolean indexing
, get sample
and assign back with convert values to numpy array for prevent index alignment:
m = df['ID'] == 1
df[m] = df[m].sample(frac=1).to_numpy()
#oldier pandas versions
#df[m] = df[m].sample(frac=1).values
print (df)
ID Fruit Color
0 1 pear green
1 2 orange orange
2 1 grape black
3 2 grapefruit yellow
4 1 apple green
5 2 tomato red
6 1 banana yellow
7 2 melon yellow
Upvotes: 1