Reputation: 738
Consider a dataframe df
with N
columns and M
rows:
>>> df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))
>>> df
a b c d e
0 4 4 5 5 7
1 9 3 8 8 1
2 2 8 1 8 5
3 9 5 1 2 7
4 3 5 8 2 3
5 2 8 8 2 8
6 3 1 7 2 6
7 4 1 5 6 3
8 5 4 4 9 5
9 3 7 5 6 6
I want to randomly choose two columns and then randomly choose one particular row (this would give me two values of the same row). I can achieve this using
>>> df.sample(2, axis=1).sample(1,axis=0)
e a
1 3 5
I want to perform this K
times like below :
>>> for i in xrange(5):
... df.sample(2, axis=1).sample(1,axis=0)
...
e a
1 3 5
d b
2 1 9
e b
4 8 9
c b
0 6 5
e c
1 3 5
I want to ensure that I do not choose the same two values (by choosing the same two columns and same row) in any of the trials. How would I achieve this?
I want to then perform a bitwise XOR operation on the two chosen values in each trial as well. For example, 3 ^ 5, 1 ^ 9 , .. and count all the bit differences in the chosen values.
Upvotes: 2
Views: 674
Reputation: 59579
You can create a list of all of the index by 2 column tuples. And then take random selections from that without replacement.
import pandas as pd
import numpy as np
from itertools import combinations, product
np.random.seed(123)
df = pd.DataFrame(np.random.randint(1, 10, (10, 5)), columns=list('abcde'))
#df = df.reset_index() #if index contains duplicates
K = 5
choices = np.array(list(product(df.index, combinations(df.columns, 2))))
idx = choices[np.r_[np.random.choice(len(choices), K, replace=False)]]
#array([[9, ('a', 'e')],
# [2, ('a', 'e')],
# [1, ('a', 'c')],
# [3, ('b', 'e')],
# [8, ('d', 'e')]], dtype=object)
Then you can decide how exactly you want your output, but something like this is close to what you show:
pd.concat([df.loc[myid[0], list(myid[1])].reset_index().T for myid in idx])
# 0 1
#index a e
#9 4 8
#index a e
#2 1 1
#index a c
#1 7 1
#index b e
#3 2 3
#index d e
#8 5 7
Upvotes: 4