pandas shuffle last N rows

Question

How can I shuffle the last N rows in a pandas dataframe? When I say "shuffle", I mean to randomly change the order of rows. This is what I've tried so far. I can't figure out how to properly reset the index.

import pandas as pd
import numpy as np
dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
pd.concat([dat[:5], dat[5:].sample(frac=1).reset_index(drop=True)])

output:

         d1
0  0.000000
1  0.111111
2  0.222222
3  0.333333
4  0.444444
0  0.777778
1  0.666667
2  0.888889
3  1.000000
4  0.555556

Dani Mesejo · Accepted Answer

You can use shuffle directly:

import pandas as pd
import numpy as np

np.random.seed(42)

dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
np.random.shuffle(dat.values[5:])
print(dat)

Output

d1
0  0.000000
1  0.111111
2  0.222222
3  0.333333
4  0.444444
5  0.666667
6  1.000000
7  0.777778
8  0.555556
9  0.888889

Or, if you prefer, permutation:

import pandas as pd
import numpy as np

dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
dat.values[5:] = np.random.permutation(dat.values[5:])

print(dat)

Output

         d1
0  0.000000
1  0.111111
2  0.222222
3  0.333333
4  0.444444
5  0.555556
6  0.888889
7  0.777778
8  1.000000
9  0.666667

pandas shuffle last N rows

Answers (2)

Related Questions