Reputation: 87
Let's see example data
nad=structure(list(x1 = 1:5, x2 = c(NA, 2L, 2L, NA, 34L), x3 = c(NA,
1L, NA, NA, NA), x4 = c(NA, 2L, 5L, NA, NA), x5 = c(NA, 3L, NA,
NA, NA), x6 = c(NA, 4L, NA, NA, NA)), .Names = c("x1", "x2",
"x3", "x4", "x5", "x6"), class = "data.frame", row.names = c(NA,
-5L))
x1 x2 x3 x4 x5 x6
1 1 NA NA NA NA NA
2 2 2 1 2 3 4
3 3 2 NA 5 NA NA
4 4 NA NA NA NA NA
5 5 34 NA NA NA NA
Usually to get complete data without NA
, i can use this function
na.omit(nad)
But my problem a little complex.
In spite of the fact that x2
has NA
, i do not need delete row where there are NA
for x2
.
Valuable data is where there is value for x1
and not in x2
,
and if there are observations in the row for x1
and x2
but not on the another variables, then the row should not be deleted.
Therefore, the first and 4th rows should not be deleted.
3 and 5 should be deleted, because here, on the one hand there are observations on x1
and x2
, but other variables are blank.
Second row is completely complete, i do not need to delete it.
How can I delete NA
using such condition?
Desired output
x1 x2 x3 x4 x5 x6
1 1 NA NA NA NA NA
2 2 2 1 2 3 4
3 4 NA NA NA NA NA
As an addition(separately question, but adjacent), I also want to ask here, maybe I will need this for analytics if there is such situation
x1 x2 x3 x4 x5 x6
1 1 NA NA NA NA NA
2 2 NA 1 1 1 1
Here first row has NA
for x2
, and NA
for other variables,
and second row has NA
for x2
,but another variable is not NA.
How in such case, left only rows where x1
has value, x2
doesn't have, but another variable have values?
Upvotes: 0
Views: 74
Reputation: 388797
So maybe you are looking for
nad[!is.na(nad$x1) & is.na(nad$x2) | rowSums(!is.na(nad)) == ncol(nad), ]
# x1 x2 x3 x4 x5 x6
#1 1 NA NA NA NA NA
#2 2 2 1 2 3 4
#4 4 NA NA NA NA NA
This selects rows where x1
has non-NA values and x2
has NA
OR all the values in the row are non-NA.
Upvotes: 2
Reputation: 37903
I think you would probably be best off by checking each row wether it satisfies your conditions. If I understood correctly, something like the following could work:
keep <- apply(nad, 1, function(row) {
# Don't keep data if first column is NA
if (!is.na(row[[1]])) {
sumna <- sum(is.na(row[-1]))
# Only keep if rest is all NA or none is NA
if (sumna == 0 | sumna == length(row) - 1) {
return(TRUE)
} else {
return(FALSE)
}
} else {
return(FALSE)
}
})
nad[keep,]
x1 x2 x3 x4 x5 x6
1 1 NA NA NA NA NA
2 2 2 1 2 3 4
4 4 NA NA NA NA NA
Upvotes: 2