Reputation: 1200
I have a data frame consisting of participants' scores in a wide format, with each column representing one questionnaire item and each row representing the responses of one participant.
My data would look something like the following:
id <- c("001", "002", "003", "004", "005")
q1 <- c(1, 2, "", 3, 5)
q2 <- c("", 2, 3, 1, "")
q3 <- c(1, 3, 2, 5, 2)
q4 <- c(1, 2, 3, "", 4)
q5 <- c(1, 2, 3, 4, 3)
df <- data.frame(id, q1, q2, q3, q4, q5)
df[, 2:6] <- apply(df[, 2:6], 2, as.numeric)
I would like to retain rows of participants' data when they leave a particular column, let's say q4, blank. Concurrently, I would like to exclude a row as long as any item other than q4 is left blank.
The end result I have in mind is like the screenshot below. Participant 001 and 005 are removed because they have missing q2 scores. Participant 003 is removed because of missing q1 scores. Even though participant 004 has a missing q4 score, I still want to retain participant 004 in the data frame.
If possible, could someone provide me with an answer that could be used in a dplyr pipe? Thanks and much appreciated!
Upvotes: 3
Views: 319
Reputation: 6768
Try drop_na(
) from tidyr
library(tidyr)
df %>% drop_na(-q4)
# output
id q1 q2 q3 q4 q5
#2 002 2 2 3 2 2
#4 004 3 1 5 NA 4
Upvotes: 1
Reputation: 744
base r using complete.cases
and which
df[complete.cases(df[,-which(names(df) %in% "q4")]),]
# id q1 q2 q3 q4 q5
#2 002 2 2 3 2 2
#4 004 3 1 5 NA 4
or by column position
df[complete.cases(df[,-5]),]
# id q1 q2 q3 q4 q5
#2 002 2 2 3 2 2
#4 004 3 1 5 NA 4
Upvotes: 1