DTYK
DTYK

Reputation: 1200

Retain rows with blank values in one particular column and remove rows with blank values in all other columns

I have a data frame consisting of participants' scores in a wide format, with each column representing one questionnaire item and each row representing the responses of one participant.

My data would look something like the following:

id <- c("001", "002", "003", "004", "005")
q1 <- c(1, 2, "", 3, 5)
q2 <- c("", 2, 3, 1, "")
q3 <- c(1, 3, 2, 5, 2)
q4 <- c(1, 2, 3, "", 4)
q5 <- c(1, 2, 3, 4, 3)
df <- data.frame(id, q1, q2, q3, q4, q5)
df[, 2:6] <- apply(df[, 2:6], 2, as.numeric)

I would like to retain rows of participants' data when they leave a particular column, let's say q4, blank. Concurrently, I would like to exclude a row as long as any item other than q4 is left blank.

The end result I have in mind is like the screenshot below. Participant 001 and 005 are removed because they have missing q2 scores. Participant 003 is removed because of missing q1 scores. Even though participant 004 has a missing q4 score, I still want to retain participant 004 in the data frame.

End Result

If possible, could someone provide me with an answer that could be used in a dplyr pipe? Thanks and much appreciated!

Upvotes: 3

Views: 319

Answers (2)

nghauran
nghauran

Reputation: 6768

Try drop_na() from tidyr

library(tidyr)
df %>% drop_na(-q4)

# output
   id q1 q2 q3 q4 q5
#2 002  2  2  3  2  2
#4 004  3  1  5 NA  4

Upvotes: 1

Shique
Shique

Reputation: 744

base r using complete.cases and which

df[complete.cases(df[,-which(names(df) %in% "q4")]),]
#   id q1 q2 q3 q4 q5
#2 002  2  2  3  2  2
#4 004  3  1  5 NA  4

or by column position

df[complete.cases(df[,-5]),]
#   id q1 q2 q3 q4 q5
#2 002  2  2  3  2  2
#4 004  3  1  5 NA  4

Upvotes: 1

Related Questions