string
string

Reputation: 827

Select distinct rows in a data frame with only NA Values in R

I have a data frame with 3 cols.

ID1 <- c(1,1,2,2,3,4)

ID2 <- c(11,NA,12,NA,NA,NA)

Val <- c("A","B","C","D","E","F")

DF <- data.frame(ID1,ID2,Val, stringsAsFactors=FALSE)

Now, I need to extract unique rows which have ID2 as "NA". In this case, desired output will be data frame with two rows i.e. ID1 = 3,4. I tried below subset command which results into all the four rows with NA. Looking for ways to achieve the desired output.

DF2 <- subset(DF , is.na(ID2))

Upvotes: 0

Views: 1040

Answers (3)

DJack
DJack

Reputation: 4940

If by unique rows, you mean unique values of ID1, then this code makes the trick:

DF[which(!duplicated(DF$ID1) & is.na(DF$ID2)),]

  ID1 ID2 Val
5   3  NA   E
6   4  NA   F

If you prefer using subset, then this code gives the same output:

subset(DF , !duplicated(ID1) & is.na(ID2))

Upvotes: 1

ytu
ytu

Reputation: 1850

Define a function to look up ID1 groups which have all NAs in ID2, and then return the unique rows of them.

library(dplyr)

select_na <- function(df_sub) {
  if (any(!is.na(df_sub$ID2))) {
    return(df_sub[0,])
  }
  else {
    return(unique(df_sub))
  }
}

DF %>%
  group_by(ID1) %>%
  do(select_na(.))

gives exactly what you want.

Upvotes: 0

nghauran
nghauran

Reputation: 6768

Try:

library(dplyr)
DF %>%
        group_by(ID1) %>%
        filter(n() == 1 & is.na(ID2))

Upvotes: 1

Related Questions