Python create combinations of ID's based on conditions

Question

Hi I would like to create combinations of ID's. I know how to create all possible combinations but am stuck on one final part of the operation. Any help will be greatly appreciated.

I have a dataset as follows:

import pandas as pd from itertools import combinations_with_replacement

d1 = {'Subject': ['Subject1','Subject1','Subject1','Subject2','Subject2','Subject2','Subject3','Subject3','Subject3','Subject4','Subject4','Subject4','Subject5','Subject5','Subject5'],
'Actual':['1','0','0','0','0','1','0','1','0','0','0','0','1','0','1'],
'Event':['1','2','3','1','2','3','1','2','3','1','2','3','1','2','3'],
'Category':['1','1','2','1','1','2','2','2','2','1','1','1','1','2','1'],
'Variable1':['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15'],
'Variable2':['12','11','10','9','8','7','6','5','4','3','2','1','-1','-2','-3'],
'Variable3': ['-6','-5','-4','-3','-4','-3','-2','-1','0','1','2','3','4','5','6']}
d1 = pd.DataFrame(d1)

I want to create all possible combinations of the subjects within each event within each tier. This is done by (from a previous question Form groups of individuals python (pandas)):

L = [(i[0], i[1], y[0], y[1]) for i, x in d1.groupby(['Event','Category'])['Subject'] 
                          for y in list(combinations_with_replacement(x, 2))]
df = pd.DataFrame(L, columns=['Event','Category','Subject_IDcol1','Subject_IDcol2'])

Now, I want to take all of those pairs for which Actual = 1 and randomly select "n" Subjects for which Actual = 0. Here for simplicity sake let's take n = 1. I want to run the function combinations_with_replacement on this new list.

The output that I want to get for example (assuming random selection) is something like this:

For event 1, category 1: Subject 1 and 5 have Actual = 1 and suppose Subject 2 is randomly drawn.

As compared to this, in the previous case, the result was something like this (for event =1 and category =1)

Any help will be appreciated. Thanks.

javidcf · Accepted Answer

I think this is one way to do what you want:

import itertools
import pandas as pd
import numpy as np

d1 = {
    'Subject': ['Subject1', 'Subject1', 'Subject1', 'Subject2', 'Subject2', 'Subject2',
                'Subject3', 'Subject3', 'Subject3', 'Subject4', 'Subject4', 'Subject4',
                'Subject5', 'Subject5', 'Subject5'],
    'Actual': ['1', '0', '0', '0', '0', '1', '0', '1', '0', '0', '0', '0', '1', '0', '1'],
    'Event': ['1', '2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3', '1', '2', '3'],
    'Category': ['1', '1', '2', '1', '1', '2', '2', '2', '2', '1', '1', '1', '1', '2', '1'],
    'Variable1': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15'],
    'Variable2': ['12', '11', '10', '9', '8', '7', '6', '5', '4', '3', '2', '1', '-1', '-2', '-3'],
    'Variable3': ['-6', '-5', '-4', '-3', '-4', '-3', '-2', '-1', '0', '1', '2', '3', '4', '5', '6']
}
d1 = pd.DataFrame(d1)
num_nonactual = 1

np.random.seed(100)
# First leave only up to num_nonactual subjects with actual != '1' for each event/category
g1 = d1.groupby(['Event', 'Category', 'Actual'], group_keys=False)
d2 = g1.apply(lambda x: x if x.name[2] == '1' else x.sample(min(num_nonactual, len(x))))
# Then do the same as before
d2.sort_values('Subject', inplace=True)
L = [(i1, i2, y1, y2)
     for (i1, i2), x in d2.groupby(['Event', 'Category'])['Subject']
     for y1, y2 in itertools.combinations_with_replacement(x, 2)]
df = pd.DataFrame(L, columns=['Event', 'Category', 'Subject_IDcol1', 'Subject_IDcol2'])
print(df)

Output:

   Event Category Subject_IDcol1 Subject_IDcol2
0      1        1       Subject1       Subject1
1      1        1       Subject1       Subject4
2      1        1       Subject1       Subject5
3      1        1       Subject4       Subject4
4      1        1       Subject4       Subject5
5      1        1       Subject5       Subject5
6      1        2       Subject3       Subject3
7      2        1       Subject2       Subject2
8      2        2       Subject3       Subject3
9      2        2       Subject3       Subject5
10     2        2       Subject5       Subject5
11     3        1       Subject4       Subject4
12     3        1       Subject4       Subject5
13     3        1       Subject5       Subject5
14     3        2       Subject2       Subject2
15     3        2       Subject2       Subject3
16     3        2       Subject3       Subject3

Python create combinations of ID's based on conditions

Answers (1)

Related Questions

Python create combinations of ID&#39;s based on conditions

Answers (1)

Related Questions

Python create combinations of ID's based on conditions