Kogan
Kogan

Reputation: 87

Delete NA data ,but with certain condition in R

Let's see example data

nad=structure(list(x1 = 1:5, x2 = c(NA, 2L, 2L, NA, 34L), x3 = c(NA, 
1L, NA, NA, NA), x4 = c(NA, 2L, 5L, NA, NA), x5 = c(NA, 3L, NA, 
NA, NA), x6 = c(NA, 4L, NA, NA, NA)), .Names = c("x1", "x2", 
"x3", "x4", "x5", "x6"), class = "data.frame", row.names = c(NA, 
-5L))
  x1 x2 x3 x4 x5 x6
1  1 NA NA NA NA NA
2  2  2  1  2  3  4
3  3  2 NA  5 NA NA
4  4 NA NA NA NA NA
5  5 34 NA NA NA NA

Usually to get complete data without NA, i can use this function

na.omit(nad)

But my problem a little complex. In spite of the fact that x2 has NA, i do not need delete row where there are NA for x2. Valuable data is where there is value for x1 and not in x2, and if there are observations in the row for x1 and x2 but not on the another variables, then the row should not be deleted. Therefore, the first and 4th rows should not be deleted. 3 and 5 should be deleted, because here, on the one hand there are observations on x1 and x2, but other variables are blank. Second row is completely complete, i do not need to delete it. How can I delete NA using such condition? Desired output

  x1 x2 x3 x4 x5 x6
1  1 NA NA NA NA NA
2  2  2  1  2  3  4
3  4 NA NA NA NA NA

As an addition(separately question, but adjacent), I also want to ask here, maybe I will need this for analytics if there is such situation

  x1 x2 x3 x4 x5 x6
1  1 NA NA NA NA NA
2  2 NA  1  1  1  1

Here first row has NA for x2, and NA for other variables, and second row has NA for x2,but another variable is not NA. How in such case, left only rows where x1 has value, x2 doesn't have, but another variable have values?

Upvotes: 0

Views: 74

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388797

So maybe you are looking for

nad[!is.na(nad$x1) & is.na(nad$x2) | rowSums(!is.na(nad)) == ncol(nad), ]

#  x1 x2 x3 x4 x5 x6
#1  1 NA NA NA NA NA
#2  2  2  1  2  3  4
#4  4 NA NA NA NA NA

This selects rows where x1 has non-NA values and x2 has NA OR all the values in the row are non-NA.

Upvotes: 2

teunbrand
teunbrand

Reputation: 37903

I think you would probably be best off by checking each row wether it satisfies your conditions. If I understood correctly, something like the following could work:

keep <- apply(nad, 1, function(row) {
  # Don't keep data if first column is NA
  if (!is.na(row[[1]])) {
    sumna <- sum(is.na(row[-1]))
    # Only keep if rest is all NA or none is NA
    if (sumna == 0 | sumna == length(row) - 1) {
      return(TRUE)
    } else {
      return(FALSE)
    }
  } else {
    return(FALSE)
  }
})

nad[keep,]
  x1 x2 x3 x4 x5 x6
1  1 NA NA NA NA NA
2  2  2  1  2  3  4
4  4 NA NA NA NA NA

Upvotes: 2

Related Questions