Yukilia
Yukilia

Reputation: 41

Filter a dataframe based on fixed ranking while retaining NA

let's create an example first

scale1 <- c(5,NA,2,1)
scale2 <- c(NA,4,NA,3)
scale3 <- c(3,NA,5,NA)
scale4 <- c(2,1,NA,5)
df<- data.frame(scale1,scale2,scale3,scale4)
df

Here is the output

##    scale1 scale2 scale3 scale4
#1      5     NA      3      2
#2      NA     4     NA     1
#3      2     NA      5     NA
#4      1      3     NA      5

Here is what I'm stuck.

I am doing a survey where the participants have to rate on multiple scales. The value of scale is supposed to in this order with

scale 1 >= scale 2 >= scale 3 >= scale 4

so I want to remove those violated this order while keeping NA (as the scales are randomly assigned)

The output should look like this (case 3 and 4 removed)

##    scale1 scale2 scale3 scale4
#1      5     NA      3      2
#2      NA     4     NA     1

Is there an efficient way to achieve this (since I have lots of sets of scales in my actual data)

Thank you!

Upvotes: 0

Views: 31

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388862

You can do this with row-wise apply :

cols <- grep('scale', names(df))
df[apply(df[cols], 1, function(x) all(diff(na.omit(x)) < 0)), ]

#  scale1 scale2 scale3 scale4
#1      5     NA      3      2
#2     NA      4     NA      1

and the same using dplyr :

library(dplyr)
df %>%
  rowwise() %>%
  filter(all(diff(na.omit(c_across(starts_with('scale')))) < 0 ))

This selects the rows where all the values in the row is smaller than the previous value in the row.

data

df <- structure(list(scale1 = c(5, NA, 2, 1), scale2 = c(NA, 4, NA, 
3), scale3 = c(3, NA, 5, NA), scale4 = c(2, 1, NA, 5)), 
class = "data.frame", row.names = c(NA, -4L))

Upvotes: 2

Related Questions