Vitaliy Poletaev
Vitaliy Poletaev

Reputation: 113

Manual prediction in R (data frame)

I have data frame:

DF
   Chset Choices X1 X2 utility
1      1       8  1  1       2
2      1       2  0  1       3
3      1       1  1  0      -1
4      2       1  1  1       2
5      2       5  0  1       5
6      2       1  1  0      -1
7      2       2  0  0       0
8      3       1  1  1       2
9      3       2  0  1       6
10     3       5  1  0      -1
11     4       6  1  1       2
12     4       1  0  1      14
13     4       1  1  0      -1
14     4       1  0  0       0

And I want to create column "predict" where I put 1 if utility is maximum in Chset. For example, we have 3 rows where Chset=1, and those have utilities (2,3,-1). Then, in column "predict" should be (0,1,0) - 1 for row 2, because it has the maximum utility in Chset=1, and so on:

   Chset Choices X1 X2 utility predict
1      1       8  1  1       2       0
2      1       2  0  1       3       1
3      1       1  1  0      -1       0
4      2       1  1  1       2       0
5      2       5  0  1       5       1
6      2       1  1  0      -1       0
7      2       2  0  0       0       0
8      3       1  1  1       2       0
9      3       2  0  1       6       1
10     3       5  1  0      -1       0
11     4       6  1  1       2       0
12     4       1  0  1      14       1
13     4       1  1  0      -1       0
14     4       1  0  0       0       0

After that, I want to cheak, whether the prediction is right. The prediction is correct if predict=1 and value in column "Choices" is the maximum in its "Chset". For example, in Chset=1 we can see "predict"=1 for the 2nd row, whereas the maximum "Choices" in Chset=1 is on the 1st row (and equals to 8), so prediction is incorrect. By contrast, in Chset=2, "predict" is equal to 1 for the 5th row, and this row has the maximum value of "Choices" within this Chset=2, so here prediction is correct. To cheak all cases, I want to create table "cheak" which is equal to 1 if prediction is correct, and 0 vice versa. Finally, I should get:

   Chset Choices X1 X2 utility predict cheak
1      1       8  1  1       2       0     0 
2      1       2  0  1       3       1     0
3      1       1  1  0      -1       0     0
4      2       1  1  1       2       0     0
5      2       5  0  1       5       1     1
6      2       1  1  0      -1       0     0
7      2       2  0  0       0       0     0
8      3       1  1  1       2       0     0
9      3       2  0  1       6       1     0
10     3       5  1  0      -1       0     0
11     4       6  1  1       2       0     0
12     4       1  0  1      14       1     0
13     4       1  1  0      -1       0     0
14     4       1  0  0       0       0     0

How can I do that?

Waiting for your help

Upvotes: 0

Views: 109

Answers (1)

Ernest A
Ernest A

Reputation: 7839

This should do it

DF <- 
unsplit(lapply(split(DF, DF$Chset),
               function(x)  within(x, {
                   predict <- as.numeric(utility == max(utility))
                   check <- as.numeric(Choices == max(Choices) & predict == 1)
               })),
        DF$Chset)

Upvotes: 1

Related Questions