amisos55
amisos55

Reputation: 1979

Generate a new variable based on values change in another variable r

I asked something very similar [enter link description here][1] but I have a better understanding of my problem now. I will try my best to ask it as clear as I can.

I have a sample dataset looks like this below:

    id <-       c(1,1,1, 2,2,2, 3,3, 4,4, 5,5,5,5, 6,6,6, 7, 8,8, 9,9, 10,10)
item.id <-  c(1,1,2, 1,1,1 ,1,1, 1,2, 1,2,2,2, 1,1,1, 1, 1,2, 1,1, 1,1)
sequence <- c(1,2,1, 1,2,3, 1,2, 1,1, 1,1,2,3, 1,2,3, 1, 1,1, 1,2, 1,2)
score <-    c(0,0,0, 0,0,1, 2,0, 1,1, 1,0,1,1, 0,0,0, 1, 0,2, 1,2, 2,1)

data <- data.frame("id"=id, "item.id"=item.id, "sequence"=sequence, "score"=score)
> data
   id item.id sequence score
1   1       1        1     0
2   1       1        2     0
3   1       2        1     0
4   2       1        1     0
5   2       1        2     0
6   2       1        3     1
7   3       1        1     2
8   3       1        2     0
9   4       1        1     1
10  4       2        1     1
11  5       1        1     1
12  5       2        1     0
13  5       2        2     1
14  5       2        3     1
15  6       1        1     0
16  6       1        2     0
17  6       1        3     0
18  7       1        1     1
19  8       1        1     0
20  8       2        1     2
21  9       1        1     1
22  9       1        2     2
23 10       1        1     2
24 10       1        2     1

id represents for each student, item.id represents the questions students take, sequence is the attempt number for each item.id, and score is the score for each attempt, taking 0,1, or 2. Students can change their answers.

For item.id within each id, I create a variable (status) by looking at the last two sequences (changes): Here the recoding rules are for status:

1-If there is only one attempt for each question: 
    a) assign "BTW" (Blank to Wrong) if the item score is 0.
    b) assign "BTW" (Blank to Right) if the item score is 1.
2-If there are multiple attempts for each question:
    a) assign "BTW" (Blank to Wrong) if the first item attempt score is 0.
    b) assign "BTW" (Blank to Right) if the first item attempt score is 1.
    c) assign "WW" for those who changed from wrong to wrong (0 to 0),
    d) assign "WR" for those who changed to increasing score (0 to 1, or 1 to 2),
    e) assign "RW" for those who changed to decreasing score (2 to 1, 2 to 0, or 1 to 0 ), and
    f) assign "RR" for those who changed from right to right (1 to 1, 2 to 2).

score change from 0 to 1 or 0 to 2 or 1 to 2 considered correct (right) change while, score change from 1 to 0 or 2 to 0 or 2 to 1 considered incorrect (wrong) change.

If there is only one attempt for item.id as in id=7, then the status should be "BTR". If the score was 0, then it should be "BTW". the logic is supposed to be if the score increases, it should be WR, if it decreases, it should be RW.

a) from 1 to 2 as WR, instead, they were coded as RR,
b) from 2 to 1 as RW, instead, they were coded as WW.

I used this code. Things did not work out for some, for example for id=1. The status should be {BTW, WW}.

library(dplyr)
data %>%   group_by(id,item.id) %>%
 mutate(diff = c(0, diff(score)),
                status = case_when(
                  n() == 1 & score == 0 ~ "BTW",
                  n() == 1 & score == 1 ~ "BTR",
                  diff == 0 & score == 0 ~ "WW",
                  diff == 0 & score > 0 ~ "RR",
                  diff > 0 ~ "WR",
                  diff < 0 ~ "RW",
                  TRUE ~ "oops")) 

> data
   id item.id sequence score diff status
1   1       1        1     0    0     WW
2   1       1        2     0    0     WW
3   1       2        1     0    0    BTW
4   2       1        1     0    0     WW
5   2       1        2     0    0     WW
6   2       1        3     1    1     WR
7   3       1        1     2    0     RR
8   3       1        2     0   -2     RW
9   4       1        1     1    0    BTR
10  4       2        1     1    0    BTR
11  5       1        1     1    0    BTR
12  5       2        1     0    0     WW
13  5       2        2     1    1     WR
14  5       2        3     1    0     RR
15  6       1        1     0    0     WW
16  6       1        2     0    0     WW
17  6       1        3     0    0     WW
18  7       1        1     1    0    BTR
19  8       1        1     0    0    BTW
20  8       2        1     2    0     RR
21  9       1        1     1    0     RR
22  9       1        2     2    1     WR
23 10       1        1     2    0     RR
24 10       1        2     1   -1     RW

the desired output would be with cases:

  > desired

   id item.id sequence score  status
1   1       1        1     0   BTW
2   1       1        2     0   WW
3   1       2        1     0   BTW
4   2       1        1     0   BTW
5   2       1        2     0   WW
6   2       1        3     1   WR
7   3       1        1     2   BTR
8   3       1        2     0   RW
9   4       1        1     1   BTR
10  4       2        1     1   BTR
11  5       1        1     1   BTR
12  5       2        1     0   BTW
13  5       2        2     1   WR
14  5       2        3     1   RR
15  6       1        1     0   BTW
16  6       1        2     0   WW
17  6       1        3     0   WW
18  7       1        1     1   BTR
19  8       1        1     0   BTW
20  8       2        1     2   BTR
21  9       1        1     1   BTR
22  9       1        2     2   RR
23 10       1        1     2   BTR
24 10       1        2     1   RW

Any opinions? Thanks!

Upvotes: 0

Views: 958

Answers (1)

Dave2e
Dave2e

Reputation: 24079

In order to solve this, I broke the problem down into two steps. First identify the Blank to answer lines. Then once the first tries are identified then assign the change of answers to the remaining lines.

#rows that are not the first answer are assigned a "NA"
test<-data %>%   group_by(id,item.id) %>%
  mutate(status = case_when(
    sequence == 1 & score == 0 ~ "BTW",
    sequence == 1 & score >0 ~ "BTR",
    TRUE ~ "NA")) 

answer<- test %>% ungroup() %>% group_by(id, item.id) %>%
  transmute(sequence, score, 
            status = case_when(score == 0 & score==lag(score) & status=="NA" ~ "WW",
                               score >= 1 & score == lag(score) & status=="NA"~ "RR",
                               score > 0 & score > lag(score) & status=="NA"~ "WR",
                               score < lag(score) & status=="NA"~ "RW",
                               TRUE ~ status)) 


head(answer, 20)
tail(answer, 4)

The status column matches your sample data for all rows except row 20, please double check the calculation.

Upvotes: 1

Related Questions