Supervised Learning Random Forest by Group

Question

So I have a training data set like this (but much larger):

       Group   PID  Var1  Var2  Best
    0    111     1     1     1     1
    1    111     2     2     1     2
    2    111     3     1     2     2
    3    112     1     1     2     2
    4    112     2     2     1     1
    5    113     1     1     2     2
    6    113     2     1     1     2
    7    113     3     2     1     1
    8    113     4     3     2     2

Where each group (rows that share a group number) contains a list of people (each unique PID within each group), and one person within the group with Best = 1, and the rest with Best = 2. My goal is to use this training data predict which person in each group is the best (Best = 1) based on Var1 and Var2.

I have played around with Scikit learn and have tried to use the random forest model to predict Best for the test data, but it does not account for the groups and can assign Best = 1 for more than one PID per group.

I was wondering how to train/run the model so that it learns to assign a single Best = 1 per group instead of assigning it across all rows and groups. Pointing me in the direction of helpful resources would be just as good as I'm not exactly sure where to go for help on this.

Supervised Learning Random Forest by Group

Answers (1)

Related Questions