Reputation: 839

R: Selection of rows in data frame includes NA

My data looks like this after import

A = data.frame( ID= c(1,2,3,4,5,6), Name = c(NA,"A",NA,NA,NA,"B"))

>A
ID Name
1 <NA>
2    A
3 <NA>
4 <NA>
5 <NA>
6    B

I expect this result, when I select all rows with Name=="A":

   ID Name
2  2    A

However, I get 5 rows:

> A[A$Name=="A",]
     ID Name
NA   NA <NA>
2     2    A
NA.1 NA <NA>
NA.2 NA <NA>
NA.3 NA <NA>

Note that I do not look for complete.cases() since there are many more columns in the data frame. And I also did specify the na.strings parameter during read.csv(...,na.strings = NA). The missing values are not "NA" but NA in the csv file and playing around with that during import did not change anything.

Upvotes: 1

Answers (5)

Cath

Reputation: 24074

You can also use %in% instead of ==:

A[A$Name %in% "A", ]
#   ID Name
#2  2    A

Upvotes: 2

akrun

Reputation: 887118

Here is a way by setting 'Name' as the key column after converting to data.table.

library(data.table)
setDT(A, key='Name')['A']
#   ID Name
#1:  2    A

Upvotes: 1

Marta

Reputation: 3162

Try this:

> A[which(A$Name=="A"), ]
  ID Name
2  2    A

Upvotes: 4

r.bot

Reputation: 5424

Yes, this is apparently desired behaviour of R.

Try

A = data.frame( ID= c(1,2,3,4,5,6), Name = c(NA,"A",NA,NA,NA,"B"))

A[A$Name=="A" & !is.na(A$Name),]
   ID Name
2  2    A

This is because comparing NA to a value equates to NA and not TRUE or FALSE

"B" == "A"
[1] FALSE
"A" == "A"
[1] TRUE
NA == "A"
[1] NA

Upvotes: 1

CuriousBeing

Reputation: 1632

To see the result you need, try this:

> subset(A,Name=="A")
  ID Name
2  2    A

Upvotes: 5

R: Selection of rows in data frame includes NA

Answers (5)

Related Questions