Reputation: 10083
How can I shuffle the last N rows in a pandas dataframe? When I say "shuffle", I mean to randomly change the order of rows. This is what I've tried so far. I can't figure out how to properly reset the index.
import pandas as pd
import numpy as np
dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
pd.concat([dat[:5], dat[5:].sample(frac=1).reset_index(drop=True)])
output:
d1
0 0.000000
1 0.111111
2 0.222222
3 0.333333
4 0.444444
0 0.777778
1 0.666667
2 0.888889
3 1.000000
4 0.555556
Upvotes: 2
Views: 104
Reputation: 61910
You can use shuffle directly:
import pandas as pd
import numpy as np
np.random.seed(42)
dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
np.random.shuffle(dat.values[5:])
print(dat)
Output
d1
0 0.000000
1 0.111111
2 0.222222
3 0.333333
4 0.444444
5 0.666667
6 1.000000
7 0.777778
8 0.555556
9 0.888889
Or, if you prefer, permutation:
import pandas as pd
import numpy as np
dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
dat.values[5:] = np.random.permutation(dat.values[5:])
print(dat)
Output
d1
0 0.000000
1 0.111111
2 0.222222
3 0.333333
4 0.444444
5 0.555556
6 0.888889
7 0.777778
8 1.000000
9 0.666667
Upvotes: 1
Reputation: 863146
For default index add parameter ignore_index=True
to concat
:
dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
df = pd.concat([dat[:5], dat[5:].sample(frac=1)], ignore_index=True)
Another solution is use sample
only for last rows and assign back numpy array
by values
for prevent alignment of indices:
dat.iloc[5:] = dat.iloc[5:].sample(frac=1).values
Numpy solution with np.random.shuffle
working inplace:
np.random.shuffle(dat.iloc[5:].values)
print (df)
d1
0 0.000000
1 0.111111
2 0.222222
3 0.333333
4 0.444444
5 0.666667
6 0.888889
7 1.000000
8 0.555556
9 0.777778
Upvotes: 2