sranga
sranga

Reputation: 123

how to shuffle set of rows together (rows have unique id) in pandas

I want to shuffle the dataframe keeping set of rows together. The number of rows together is not constant but I have a column marking them with same Id.

For EX: In the below data first column is the unique marker specifying which rows needs to be together while shuffling.

2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 46 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 4 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 39 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 10 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 7 -135.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 0 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 35 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 5 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 47 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 12 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 13 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 20 -201.00
2 56.00 1 0.83 2.16 3147890 3120000.00 1 201.00 0 -201.00 116.00 75.88 201.00 232.00 105.74 201.00 168.00 75.88 42 -201.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 46 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 4 -95.00 
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 39 -46.00 
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 10 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 7 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 0 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 35 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 5 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 47 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 12 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 13 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 20 -135.00
4 93.00 1 0.34 3.62 4121000 5340000.00 1 135.00 0 -135.00 78.00 120.53 135.00 10.00 2.67 135.00 313.00 120.53 42 -135.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 46 -2730.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 4 -2458.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 39 -2758.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 10 -2758.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 7 -2554.00
6 74.00 0 2.35 2.89 1680840 2940000.00 11 2758.00 0 -2758.00 296.00 74.46 261.00 176.00 75.84 304.00 304.00 74.46 0 -2568.00

Upvotes: 2

Views: 781

Answers (2)

Zev
Zev

Reputation: 3491

It is unclear what end result you are looking for but regardless, the first step is probably the same. Group the dataframes into separate ones based on that column. Shuffle and recombine as desired.

Recombining can be done by storing the shuffled dataframes as a list and then pd.concat. You can optionally shuffle the list first:

from random import shuffle
shuffle(dfs)    

Using this data set:

2 a2
2 b2
2 c2
3 a3
3 b3
3 c3
3 d3
4 a4
4 b4

This code:

import pandas as pd

df =  pd.read_csv("shuffle.txt", header=None, delim_whitespace=True)
dfs = [x for _, x in df.groupby(df[0])]
from random import shuffle
#shuffle(dfs)
new_dfs = []
for df in dfs:
    df = df.sample(frac=1)
    new_dfs.append(df)

final_df = pd.concat(new_dfs)
print(final_df)

Gets you:

   0   1
2  2  c2
0  2  a2
1  2  b2
5  3  c3
3  3  a3
6  3  d3
4  3  b3
8  4  b4
7  4  a4

Uncommenting the shuffle line gets you:

   0   1
8  4  b4
7  4  a4
6  3  d3
5  3  c3
4  3  b3
3  3  a3
0  2  a2
1  2  b2
2  2  c2

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153490

You can use this generator with np.random.choice on unique col1, the pd.concat to re-assemble "groups".

import numpy as np
pd.concat((df[df['col1'] == i] for i in np.random.choice(df['col1'].unique(),
                                                         df['col1'].nunique())))

Details, first get the unique values from 'col1' as list using unique, then select random elements from this list using np.random.choice. Use that selection to boolean select parts ("group") of the dataframe inside a generator using for-in syntax and lastly, use pd.concat to re-assemble the dataframe in to random groups.

Upvotes: 5

Related Questions