Reputation: 1040
I have data such as this.
data.input <-read_table2("user.id problem.id first.correct cwa_prob prob.seq
540995 PRABNBZQ 0 1 problem.id.problem.1
540995 PRABNBZ2 1 0 problem.id.problem.2
540995 PRABNBZK 0 1 problem.id.problem.3
540995 PRABNB2B 0 1 problem.id.problem.4
540995 PRABNBWE 0 1 problem.id.problem.5
565662 PRABNB2B 1 0 problem.id.problem.1
565662 PRABNBZQ 1 1 problem.id.problem.2
565662 PRABNBZ9 1 0 problem.id.problem.3
565662 PRABNBZV 1 0 problem.id.problem.4
565662 PRABNBWG 1 0 problem.id.problem.5
")
I am trying to use the lag function to compare the occurrence of a flag in two separate columns. I want to see how a person fairs sequentially from one problem to the next. Specifically, if cwa_prob
is 1
for problem.id.problem.1
(user 540995
), is it the case that first.correct==1
in problem.id.problem.2
(user 540995
).
I tried something like this, but it throws me an error.
input.data %>%
group_by(user.id) %>%
mutate(post.cwa.correct=ifelse(cwa_prob==1 & lag(first.correct==1),1,0))
I would like my output to look something like this:
data.output <-read_table2("user.id problem.id first.correct cwa_prob prob.seq post.cwa.correct
540995 PRABNBZQ 0 1 problem.id.problem.1 NA
540995 PRABNBZ2 1 0 problem.id.problem.2 1
540995 PRABNBZK 0 1 problem.id.problem.3 NA
540995 PRABNB2B 0 1 problem.id.problem.4 0
540995 PRABNBWE 0 1 problem.id.problem.5 0
565662 PRABNB2B 1 0 problem.id.problem.1 1
565662 PRABNBZQ 1 1 problem.id.problem.2 NA
565662 PRABNBZ9 1 0 problem.id.problem.3 1
565662 PRABNBZV 1 0 problem.id.problem.4 NA
565662 PRABNBWG 1 0 problem.id.problem.5 NA
")
Maybe the lag function isn't the best function for this? Any help is appreciated!
Upvotes: 0
Views: 70
Reputation: 388982
Based on your description I think this is what you want :
library(dplyr)
data.input %>%
group_by(user.id) %>%
mutate(post.cwa.correct = as.integer(first.correct == 1 & lag(cwa_prob) == 1))
This will give a value of 1 when the current row has first.correct = 1
and the previous row has cwa_prob = 1
.
Note that you don't need ifelse(condition, 1, 0)
since you can convert TRUE
/FALSE
values to 1/0 by converting them to integer. Also in your attempt you have lag(first.correct==1)
which is different from lag(first.correct) ==1
.
Upvotes: 1
Reputation: 561
I ran everything, and it worked fine. However the output you list as wanted doesn't actually meet your criteria...you'd want
data.input %>%
group_by(user.id) %>%
mutate(post.cwa.correct=ifelse(cwa_prob==0, NA, ifelse(
cwa_prob==1 & lead(first.correct==1),1,0)
))
However, you list the dataset as "data.input," but then the dataset you use is listed as "input.data."
Upvotes: 0
Reputation: 6931
Take a look at the lag
documentation. There is an order_by
argument that determines the sort order of your records.
Unless you sent this argument, R uses the existing/default order for your dataset. So different ordering will produce different answers. This is a likely explanation for why your code has not worked, but @James founds that the code ran fine.
input.data %>%
group_by(user.id) %>%
mutate(post.cwa.correct=ifelse(cwa_prob==1 & lag(first.correct==1, order_by = "prob.seq"),1,0))
Note that as the format of prob_seq
is most likely character, the order_by
clause will sort alphabetically. This means 'problem.10'
will come before 'problem.2'
. So you'll need to extract and convert the problem number to numeric if this occurs in your data.
Upvotes: 0