Nizag
Nizag

Reputation: 979

Supervised Learning Random Forest by Group

So I have a training data set like this (but much larger):

       Group   PID  Var1  Var2  Best
    0    111     1     1     1     1
    1    111     2     2     1     2
    2    111     3     1     2     2
    3    112     1     1     2     2
    4    112     2     2     1     1
    5    113     1     1     2     2
    6    113     2     1     1     2
    7    113     3     2     1     1
    8    113     4     3     2     2

Where each group (rows that share a group number) contains a list of people (each unique PID within each group), and one person within the group with Best = 1, and the rest with Best = 2. My goal is to use this training data predict which person in each group is the best (Best = 1) based on Var1 and Var2.

I have played around with Scikit learn and have tried to use the random forest model to predict Best for the test data, but it does not account for the groups and can assign Best = 1 for more than one PID per group.

I was wondering how to train/run the model so that it learns to assign a single Best = 1 per group instead of assigning it across all rows and groups. Pointing me in the direction of helpful resources would be just as good as I'm not exactly sure where to go for help on this.

Upvotes: 0

Views: 1068

Answers (1)

Prune
Prune

Reputation: 77910

When a feature is not a well-ordered metric -- such as a discrete classification -- we use one-hot encoding. This means that for N classes (different values) of the original feature, we create a family of N features, exactly one of which will be "good" (usually 1), while the others are "bad" (typically 0). You can read this as a set of Boolean functions: isGroup111(), isGroup112(), ...

   Group111 Group112 Group113   PID  Var1  Var2  Best
0     1        0        0        1     1     1     1
1     1        0        0        2     2     1     2
2     1        0        0        3     1     2     2
3     0        1        0        1     1     2     2
4     0        1        0        2     2     1     1
5     0        0        1        1     1     2     2
6     0        0        1        2     1     1     2
7     0        0        1        3     2     1     1
8     0        0        1        4     3     2     2

Upvotes: 2

Related Questions