Reputation: 41
let's create an example first
scale1 <- c(5,NA,2,1)
scale2 <- c(NA,4,NA,3)
scale3 <- c(3,NA,5,NA)
scale4 <- c(2,1,NA,5)
df<- data.frame(scale1,scale2,scale3,scale4)
df
Here is the output
## scale1 scale2 scale3 scale4
#1 5 NA 3 2
#2 NA 4 NA 1
#3 2 NA 5 NA
#4 1 3 NA 5
Here is what I'm stuck.
I am doing a survey where the participants have to rate on multiple scales. The value of scale is supposed to in this order with
scale 1 >= scale 2 >= scale 3 >= scale 4
so I want to remove those violated this order while keeping NA (as the scales are randomly assigned)
The output should look like this (case 3 and 4 removed)
## scale1 scale2 scale3 scale4
#1 5 NA 3 2
#2 NA 4 NA 1
Is there an efficient way to achieve this (since I have lots of sets of scales in my actual data)
Thank you!
Upvotes: 0
Views: 31
Reputation: 388862
You can do this with row-wise apply
:
cols <- grep('scale', names(df))
df[apply(df[cols], 1, function(x) all(diff(na.omit(x)) < 0)), ]
# scale1 scale2 scale3 scale4
#1 5 NA 3 2
#2 NA 4 NA 1
and the same using dplyr
:
library(dplyr)
df %>%
rowwise() %>%
filter(all(diff(na.omit(c_across(starts_with('scale')))) < 0 ))
This selects the rows where all the values in the row is smaller than the previous value in the row.
data
df <- structure(list(scale1 = c(5, NA, 2, 1), scale2 = c(NA, 4, NA,
3), scale3 = c(3, NA, 5, NA), scale4 = c(2, 1, NA, 5)),
class = "data.frame", row.names = c(NA, -4L))
Upvotes: 2