Herwini
Herwini

Reputation: 437

Shuffle columns in pairs

I have the following dataframe:

d = {'Name1':['jaap','piet','tim'],'Name2':['bas','max','piet'], 'Count1':[1,5,2],'Count2' :[2,6,8], 'Win','[1,2,2]}

data = pd.DataFrame(d)

  Name1 Name2  Count1  Count2   Win
0  jaap   bas       1       2     1
1  piet   max       5       6     2
2   tim  piet       2       8     2

Now I want to randomly shuffle the columns in pairs, row by row. So Count1 belongs to Name1 and Count2 belongs to Name2. So in case the name in the column Name1 is shuffled with the name in Name2, then also the value in column Count1 is shuffled with the value in column Count2. But also the values in the last column Win must be changed from 2 to 1 and vice versa whenever a shuffle is applied in a specific row.

Example output would be:

  Name1 Name2  Count1  Count2  Win
0  bas   jaap       2       1   2
1  piet  max        5       6   2
2  piet  tim        8       2   1

Hereby row 0 and 2 are shuffled.

Proceedings:

np.apply_along_axis(np.random.permutation, 1, data[['Name1','Name2']])

np.apply_along_axis(np.random.permutation, 1, data[['Count1','Count2']])

This however doesn't ensure the same shuffle is applied for Name1 and Name2 as for Count1 and Count2.

And:

data['random'] = np.random.choice(2,len(data))
data['random1'] = data['random'].replace([1,0],[0,1])

name1 = data['Name1'].copy()
name2 = data['Name2'].copy()
count1 = dft['Count1'].copy()
count2 = data['Count2'].copy()
data['Name1'] = name1 * data['random'] + name2 *data['random1']
data['Name2'] = name1 * data['random1'] + name2 * data['random']
data['Count1'] = odds1 * data['random'] + count2 *data['random1']
data['Count2'] = odds1 * data['random1'] + count2 * data['random']

The second approach works for column pairs Name and Count but not for the last win column. I am looking for a better method that is easily applied to multiple column pairs.

Upvotes: 2

Views: 453

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

We can generate the random sample followed argsort to obtain the randomly shuffled indices which can be used to shuffle the given columns along axis=1, In order to change the Win column we can create a mask to check for the order of shuffled indices if the order is changed then substitute the values in Win by reverse mapping

c1 = ['Name1', 'Name2']
c2 = ['Count1', 'Count2']

i = np.random.rand(len(data), 2).argsort(1)

data[c1] = np.take_along_axis(data[c1].values, i, axis=1)
data[c2] = np.take_along_axis(data[c2].values, i, axis=1)

data['Win'] = data['Win'].where((i == [0, 1]).all(1), data['Win'].map({1:2, 2:1}))

  Name1 Name2  Count1  Count2  Win
0   bas  jaap       2       1    2
1  piet   max       5       6    2
2  piet   tim       8       2    1

Upvotes: 1

Related Questions