Reputation: 1145

Delete rows based on the values of specific columns

I have the following data set:

 ID <- c(1,2,3,4,5,6,7,8,9,10)
x1 <- c(1.3,    1.4,    NA, NA, 1.4,    -1.0,   NA, 0.3,    0.7,    NA)
x2 <- c(4.6,    2.6,    NA, 4.3,    NA, 5.6,    NA, 3.7,    5.3,    NA)
x3 <- c(-0.9,   5.6,    NA, -1.3,   NA, -3.4,   NA, 0.3,    -2.6,   NA)
x4 <- c(10.5,   NA, NA, 0.1,    -0.5,   NA, NA, 21.5,   2.0,    NA)
x5 <- c(9.5,    -5.0,   NA, -0.7,   3.6,    3.8,    -7.8,   9.8,    -12.2,  NA)
x6 <- c(-10.3,  NA, -4.4,   NA, 12.2,   NA, NA, -4.1,   3.3,    NA)

alldata <- data.frame(ID,x1,x2,x3,x4,x5,x6)

ID  x1  x2  x3  x4  x5  x6
1   1.3 4.6 -0.9    10.5    9.5 -10.3
2   1.4 2.6 5.6 "NA"    -5.0    "NA"
3   "NA"    "NA"    "NA"    "NA"    "NA"    -4.4
4   "NA"    4.3 -1.3    0.1 -0.7    "NA"
5   1.4 "NA"    "NA"    -0.5    3.6 12.2
6   -1.0    5.6 -3.4    "NA"    3.8 "NA"
7   "NA"    "NA"    "NA"    "NA"    -7.8    "NA"
8   0.3 3.7 0.3 21.5    9.8 -4.1
9   0.7 5.3 -2.6    2.0 -12.2   3.3
10  "NA"    "NA"    "NA"    "NA"    "NA"    "NA"

I need to delete any row if the values of x1-x5 are ALL "NA", and I don't care whether x6 has a value or "NA".

So my data would look like:

ID  x1  x2  x3  x4  x5  x6
1   1.3 4.6 -0.9    10.5    9.5 -10.3
2   1.4 2.6 5.6 "NA"    -5.0    "NA"
4   "NA"    4.3 -1.3    0.1 -0.7    "NA"
5   1.4 "NA"    "NA"    -0.5    3.6 12.2
6   -1.0    5.6 -3.4    "NA"    3.8 "NA"
7   "NA"    "NA"    "NA"    "NA"    -7.8    "NA"
8   0.3 3.7 0.3 21.5    9.8 -4.1
9   0.7 5.3 -2.6    2.0 -12.2   3.3

Upvotes: 0

Answers (3)

bjoseph

Reputation: 2166

Here is a method using rowSums:

First I convert your factor NA to actual NA:

str(alldata)
'data.frame':   10 obs. of  7 variables:
 $ ID: num  1 2 3 4 5 6 7 8 9 10
 $ x1: Factor w/ 6 levels "-1","0.3","0.7",..: 4 5 NA NA 5 1 NA 2 3 NA
 $ x2: Factor w/ 7 levels "2.6","3.7","4.3",..: 4 1 NA 3 NA 6 NA 2 5 NA
 $ x3: Factor w/ 7 levels "-0.9","-1.3",..: 1 6 NA 2 NA 4 NA 5 3 NA
 $ x4: Factor w/ 6 levels "-0.5","0.1","10.5",..: 3 NA NA 2 1 NA NA 5 4 NA
 $ x5: Factor w/ 9 levels "-0.7","-12.2",..: 7 3 NA 1 5 6 4 8 2 NA
 $ x6: Factor w/ 6 levels "-10.3","-4.1",..: 1 NA 3 NA 4 NA NA 2 5 NA

alldata[alldata=="NA"]=NA


sum(is.na(alldata))
    24

Next, I demonstrate how we can extract which rows have NA values in all the meaningful variables:

which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5)
[1]  3 10

Finally we extract the wanted rows (those that do not have NA in all the key variables):

 alldata[-which(rowSums(is.na(alldata[,c("x1","x2","x3","x4","x5")]))==5),]
  ID   x1   x2   x3   x4    x5    x6
1  1  1.3  4.6 -0.9 10.5   9.5 -10.3
2  2  1.4  2.6  5.6 <NA>    -5  <NA>
4  4 <NA>  4.3 -1.3  0.1  -0.7  <NA>
5  5  1.4 <NA> <NA> -0.5   3.6  12.2
6  6   -1  5.6 -3.4 <NA>   3.8  <NA>
7  7 <NA> <NA> <NA> <NA>  -7.8  <NA>
8  8  0.3  3.7  0.3 21.5   9.8  -4.1
9  9  0.7  5.3 -2.6    2 -12.2   3.3

Upvotes: 0

David Robinson

Reputation: 78590

You can do this with:

alldata_filtered <- alldata[rowSums(!is.na(alldata[2:6])) > 0, ]

To take this apart:

alldata[2:6]

gets you the x1 to x5 columns you care about. (Better practice may be to do subset(alldata, select = x1:x5) so that you're not relying on exact column indices). Then

!is.na(alldata[2:6])

gives a TRUE/FALSE matrix showing which of them are not NA,

rowSums(!is.na(alldata[2:6]))

tells you how many items in each row are not NA,

rowSums(!is.na(alldata[2:6])) > 0

tells you which rows have at least one not-NA item, and

alldata[rowSums(!is.na(alldata[2:6])) > 0, ]

filters for only those rows.

Upvotes: 3

SabDeM

Reputation: 7190

A base R solution based on your data (read my comment). With real NAs this solution won't work I think.

alldata[!rowSums(alldata[2:6] == "NA") == 5, ]
  ID  x1  x2   x3   x4    x5    x6
1  1 1.3 4.6 -0.9 10.5   9.5 -10.3
2  2 1.4 2.6  5.6   NA    -5    NA
4  4  NA 4.3 -1.3  0.1  -0.7    NA
5  5 1.4  NA   NA -0.5   3.6  12.2
6  6  -1 5.6 -3.4   NA   3.8    NA
7  7  NA  NA   NA   NA  -7.8    NA
8  8 0.3 3.7  0.3 21.5   9.8  -4.1
9  9 0.7 5.3 -2.6    2 -12.2   3.3

Upvotes: 0

Delete rows based on the values of specific columns

Answers (3)

Related Questions