Reputation: 437
I have the following dataframe:
d = {'Name1':['jaap','piet','tim'],'Name2':['bas','max','piet'], 'Count1':[1,5,2],'Count2' :[2,6,8], 'Win','[1,2,2]}
data = pd.DataFrame(d)
Name1 Name2 Count1 Count2 Win
0 jaap bas 1 2 1
1 piet max 5 6 2
2 tim piet 2 8 2
Now I want to randomly shuffle the columns in pairs, row by row. So Count1
belongs to Name1
and Count2
belongs to Name2
. So in case the name in the column Name1
is shuffled with the name in Name2
, then also the value in column Count1
is shuffled with the value in column Count2
. But also the values in the last column Win
must be changed from 2 to 1 and vice versa whenever a shuffle is applied in a specific row.
Example output would be:
Name1 Name2 Count1 Count2 Win
0 bas jaap 2 1 2
1 piet max 5 6 2
2 piet tim 8 2 1
Hereby row 0
and 2
are shuffled.
Proceedings:
np.apply_along_axis(np.random.permutation, 1, data[['Name1','Name2']])
np.apply_along_axis(np.random.permutation, 1, data[['Count1','Count2']])
This however doesn't ensure the same shuffle is applied for Name1 and Name2 as for Count1 and Count2.
And:
data['random'] = np.random.choice(2,len(data))
data['random1'] = data['random'].replace([1,0],[0,1])
name1 = data['Name1'].copy()
name2 = data['Name2'].copy()
count1 = dft['Count1'].copy()
count2 = data['Count2'].copy()
data['Name1'] = name1 * data['random'] + name2 *data['random1']
data['Name2'] = name1 * data['random1'] + name2 * data['random']
data['Count1'] = odds1 * data['random'] + count2 *data['random1']
data['Count2'] = odds1 * data['random1'] + count2 * data['random']
The second approach works for column pairs Name
and Count
but not for the last win
column. I am looking for a better method that is easily applied to multiple column pairs.
Upvotes: 2
Views: 453
Reputation: 71689
We can generate the random
sample followed argsort
to obtain the randomly shuffled indices which can be used to shuffle the given columns along axis=1
, In order to change the Win
column we can create a mask to check for the order of shuffled indices if the order is changed then substitute the values in Win
by reverse mapping
c1 = ['Name1', 'Name2']
c2 = ['Count1', 'Count2']
i = np.random.rand(len(data), 2).argsort(1)
data[c1] = np.take_along_axis(data[c1].values, i, axis=1)
data[c2] = np.take_along_axis(data[c2].values, i, axis=1)
data['Win'] = data['Win'].where((i == [0, 1]).all(1), data['Win'].map({1:2, 2:1}))
Name1 Name2 Count1 Count2 Win
0 bas jaap 2 1 2
1 piet max 5 6 2
2 piet tim 8 2 1
Upvotes: 1