Reputation: 123
I want to shuffle the dataframe keeping set of rows together. The number of rows together is not constant but I have a column marking them with same Id.
For EX: In the below data first column is the unique marker specifying which rows needs to be together while shuffling.
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 46 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 4 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 39 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 10 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 7 -135.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 0 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 35 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 5 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 47 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 12 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 13 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 20 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 42 -201.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 46 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 4 -95.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 39 -46.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 10 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 7 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 0 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 35 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 5 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 47 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 12 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 13 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 20 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 42 -135.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 46 -2730.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 4 -2458.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 39 -2758.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 10 -2758.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 7 -2554.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 0 -2568.00
Upvotes: 2
Views: 781
Reputation: 3491
It is unclear what end result you are looking for but regardless, the first step is probably the same. Group the dataframes into separate ones based on that column. Shuffle and recombine as desired.
Recombining can be done by storing the shuffled dataframes as a list and then pd.concat
. You can optionally shuffle the list first:
from random import shuffle
shuffle(dfs)
Using this data set:
2 a2
2 b2
2 c2
3 a3
3 b3
3 c3
3 d3
4 a4
4 b4
This code:
import pandas as pd
df = pd.read_csv("shuffle.txt", header=None, delim_whitespace=True)
dfs = [x for _, x in df.groupby(df[0])]
from random import shuffle
#shuffle(dfs)
new_dfs = []
for df in dfs:
df = df.sample(frac=1)
new_dfs.append(df)
final_df = pd.concat(new_dfs)
print(final_df)
Gets you:
0 1
2 2 c2
0 2 a2
1 2 b2
5 3 c3
3 3 a3
6 3 d3
4 3 b3
8 4 b4
7 4 a4
Uncommenting the shuffle line gets you:
0 1
8 4 b4
7 4 a4
6 3 d3
5 3 c3
4 3 b3
3 3 a3
0 2 a2
1 2 b2
2 2 c2
Upvotes: 1
Reputation: 153490
You can use this generator with np.random.choice
on unique col1, the pd.concat
to re-assemble "groups".
import numpy as np
pd.concat((df[df['col1'] == i] for i in np.random.choice(df['col1'].unique(),
df['col1'].nunique())))
Details, first get the unique values from 'col1' as list using unique
, then select random elements from this list using np.random.choice
. Use that selection to boolean select parts ("group") of the dataframe inside a generator using for-in
syntax and lastly, use pd.concat
to re-assemble the dataframe in to random groups.
Upvotes: 5