Reputation: 53
I have data set as follows (it is just a sample below):
dataframe<-data.frame("id" = c(1,2,5,7,9,21,22,23),"questionfk"=c(145,51,51,145,145,51,145,51))
In this data id represents the order of the questions. Questionfk, is the question id.
I would like to filter this data on questionfk 145 and 51, where 145 is asked right before 51 was the second question after. So what I want in the end seems like below:
dataframefiltered<-data.frame("id" = c(1,2,22,23),"questionfk"=c(145,51,145,51))
I did this with lots of if's and for's is it possible to do this with data.table? and How?
Thank you!
Upvotes: 0
Views: 56
Reputation: 887078
May be this helps
library(data.table)
setDT(dataframe)[dataframe[, {indx=which(c(TRUE, questionfk[-1]==145 &
questionfk[-.N]==51) & c(TRUE, diff(id)==1))
sort(c(indx, indx+1))}]]
# id questionfk
#1: 1 145
#2: 2 51
#3: 22 145
#4: 23 51
Upvotes: 1
Reputation: 693
I'm not sure I understand the exact conditions you're looking for, but I'm basing this on wanting to select questions 145 and 51, but only when then come consecutively in that order. I realize that this does not give the same result as you show, but presumably you can modify this to match the right conditions.
Rather than data.table, here's a way to do it with dplyr (which is also fast with big datasets, and very elegant):
dataframe %>%
mutate(last_question = lag(questionfk),
next_question = lead(questionfk),
after_145 = last_question==145,
before_51 = next_question==51) %>%
filter(after_145 | before_51) %>%
select(id, questionfk)
Upvotes: 0