kilojoules
kilojoules

Reputation: 10083

pandas shuffle last N rows

How can I shuffle the last N rows in a pandas dataframe? When I say "shuffle", I mean to randomly change the order of rows. This is what I've tried so far. I can't figure out how to properly reset the index.

import pandas as pd
import numpy as np
dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
pd.concat([dat[:5], dat[5:].sample(frac=1).reset_index(drop=True)])

output:

         d1
0  0.000000
1  0.111111
2  0.222222
3  0.333333
4  0.444444
0  0.777778
1  0.666667
2  0.888889
3  1.000000
4  0.555556

Upvotes: 2

Views: 104

Answers (2)

Dani Mesejo
Dani Mesejo

Reputation: 61910

You can use shuffle directly:

import pandas as pd
import numpy as np

np.random.seed(42)

dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
np.random.shuffle(dat.values[5:])
print(dat)

Output

d1
0  0.000000
1  0.111111
2  0.222222
3  0.333333
4  0.444444
5  0.666667
6  1.000000
7  0.777778
8  0.555556
9  0.888889

Or, if you prefer, permutation:

import pandas as pd
import numpy as np

dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
dat.values[5:] = np.random.permutation(dat.values[5:])

print(dat)

Output

         d1
0  0.000000
1  0.111111
2  0.222222
3  0.333333
4  0.444444
5  0.555556
6  0.888889
7  0.777778
8  1.000000
9  0.666667

Upvotes: 1

jezrael
jezrael

Reputation: 863146

For default index add parameter ignore_index=True to concat:

dat = pd.DataFrame({'d1': np.linspace(0, 1, 10)})
df = pd.concat([dat[:5], dat[5:].sample(frac=1)], ignore_index=True)

Another solution is use sample only for last rows and assign back numpy array by values for prevent alignment of indices:

dat.iloc[5:] = dat.iloc[5:].sample(frac=1).values

Numpy solution with np.random.shuffle working inplace:

np.random.shuffle(dat.iloc[5:].values)

print (df)
         d1
0  0.000000
1  0.111111
2  0.222222
3  0.333333
4  0.444444
5  0.666667
6  0.888889
7  1.000000
8  0.555556
9  0.777778

Upvotes: 2

Related Questions