Reputation: 1145
I have the following data set:
ID <- c(1,2,3,4,5,6,7,8,9,10)
x1 <- c(1.3, 1.4, NA, NA, 1.4, -1.0, NA, 0.3, 0.7, NA)
x2 <- c(4.6, 2.6, NA, 4.3, NA, 5.6, NA, 3.7, 5.3, NA)
x3 <- c(-0.9, 5.6, NA, -1.3, NA, -3.4, NA, 0.3, -2.6, NA)
x4 <- c(10.5, NA, NA, 0.1, -0.5, NA, NA, 21.5, 2.0, NA)
x5 <- c(9.5, -5.0, NA, -0.7, 3.6, 3.8, -7.8, 9.8, -12.2, NA)
x6 <- c(-10.3, NA, -4.4, NA, 12.2, NA, NA, -4.1, 3.3, NA)
alldata <- data.frame(ID,x1,x2,x3,x4,x5,x6)
ID x1 x2 x3 x4 x5 x6
1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 1.4 2.6 5.6 "NA" -5.0 "NA"
3 "NA" "NA" "NA" "NA" "NA" -4.4
4 "NA" 4.3 -1.3 0.1 -0.7 "NA"
5 1.4 "NA" "NA" -0.5 3.6 12.2
6 -1.0 5.6 -3.4 "NA" 3.8 "NA"
7 "NA" "NA" "NA" "NA" -7.8 "NA"
8 0.3 3.7 0.3 21.5 9.8 -4.1
9 0.7 5.3 -2.6 2.0 -12.2 3.3
10 "NA" "NA" "NA" "NA" "NA" "NA"
I need to delete any row if the values of x1-x5 are ALL "NA", and I don't care whether x6 has a value or "NA".
So my data would look like:
ID x1 x2 x3 x4 x5 x6
1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 1.4 2.6 5.6 "NA" -5.0 "NA"
4 "NA" 4.3 -1.3 0.1 -0.7 "NA"
5 1.4 "NA" "NA" -0.5 3.6 12.2
6 -1.0 5.6 -3.4 "NA" 3.8 "NA"
7 "NA" "NA" "NA" "NA" -7.8 "NA"
8 0.3 3.7 0.3 21.5 9.8 -4.1
9 0.7 5.3 -2.6 2.0 -12.2 3.3
Upvotes: 0
Views: 111
Reputation: 2166
Here is a method using rowSums:
First I convert your factor NA
to actual NA
:
str(alldata)
'data.frame': 10 obs. of 7 variables:
$ ID: num 1 2 3 4 5 6 7 8 9 10
$ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
$ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
$ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
$ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
$ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
$ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA
alldata[alldata=="NA"]=NA
sum(is.na(alldata))
24
Next, I demonstrate how we can extract which rows have NA
values in all the meaningful variables:
which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1] 3 10
Finally we extract the wanted rows (those that do not have NA
in all the key variables):
alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 <NA> -5 <NA>
4 4 <NA> 4.3 -1.3 0.1 -0.7 <NA>
5 5 1.4 <NA> <NA> -0.5 3.6 12.2
6 6 -1 5.6 -3.4 <NA> 3.8 <NA>
7 7 <NA> <NA> <NA> <NA> -7.8 <NA>
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
Upvotes: 0
Reputation: 78590
You can do this with:
alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
To take this apart:
alldata[2:6]
gets you the x1 to x5 columns you care about. (Better practice may be to do subset(alldata, select = x1:x5)
so that you're not relying on exact column indices). Then
!is.na(alldata[2:6])
gives a TRUE/FALSE matrix showing which of them are not NA,
rowSums(!is.na(alldata[2:6]))
tells you how many items in each row are not NA,
rowSums(!is.na(alldata[2:6])) > 0
tells you which rows have at least one not-NA item, and
alldata[rowSums(!is.na(alldata[2:6])) > 0, ]
filters for only those rows.
Upvotes: 3
Reputation: 7190
A base R solution based on your data (read my comment). With real NA
s this solution won't work I think.
alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
ID x1 x2 x3 x4 x5 x6
1 1 1.3 4.6 -0.9 10.5 9.5 -10.3
2 2 1.4 2.6 5.6 NA -5 NA
4 4 NA 4.3 -1.3 0.1 -0.7 NA
5 5 1.4 NA NA -0.5 3.6 12.2
6 6 -1 5.6 -3.4 NA 3.8 NA
7 7 NA NA NA NA -7.8 NA
8 8 0.3 3.7 0.3 21.5 9.8 -4.1
9 9 0.7 5.3 -2.6 2 -12.2 3.3
Upvotes: 0