Reputation: 171
I have the following data frame:
structure(list(Species = 1:4, Ni = c(1, NA, 1, 1), Zn = c(1,
1, 1, 1), Cu = c(NA, NA, 1, NA)), .Names = c("Species", "Ni",
"Zn", "Cu"), row.names = c(NA, -4L), class = "data.frame")
and I would like to get a vector containing all the species where Ni = 1
, Zn = 1
and Cu = NA
. So in this example that would be (1,4)
I thought I could have a try with the R script select * from where
, but I can't seem to install the package RMySQL on RStudio (R version 2.15.1).
Upvotes: 0
Views: 2686
Reputation: 14852
df <- structure(list(Species=1:4,Ni=c(1,NA,1,1),Zn=c(1,1,1,1),Cu=c(NA,NA,1,NA)),
.Names=c("Species","Ni","Zn","Cu"),row.names=c(NA,-4L),class="data.frame")
with(df, Species[Ni %in% 1 & Zn %in% 1 & Cu %in% NA])
[1] 1 4
Rather than using Ni == 1
you should use Ni %in% 1
, as the former will return NA
elements where Ni
is NA
. Cu %in% NA
produces the same result as is.na(Cu)
.
with(df, Species[Ni == 1 & Zn %in% 1 & Cu %in% NA])
[1] 1 NA 4
Note though that Ni == 1
used in subset
, as in @MadScone's answer, does not suffer from this (which came as a surprise to me).
subset(df, Ni == 1 & Zn == 1 & is.na(Cu), Species)
Species
1 1
4 4
Upvotes: 4
Reputation: 81693
df <- structure(list(Species = 1:4, Ni = c(1, NA, 1, 1), Zn = c(1, 1, 1, 1),
Cu = c(NA, NA, 1, NA)),
.Names = c("Species", "Ni", "Zn", "Cu"), row.names = c(NA, -4L),
class = "data.frame")
If the columns Ni
, Zn
, and Cu
contain 1
and NA
only, you could simply use:
subset(df, Ni & Zn & is.na(Cu), Species)
Upvotes: 0
Reputation: 75205
Edit:
I stand corrected by Backlin
...
It is much better to use %in%
rather than == x & !is.na()
!
MadScone
's suggestion of using subset()
is better yet, since the result remains a data.frame, even when one select only one column for the output, i.e.
> class(subset(df, Ni == 1 & Zn == 1 & is.na(Cu), Species))
[1] "data.frame"
#whereby we get a vector when only one column is selected...
> class(df[df$Ni %in% 1 & df$Zn %in% 1 & is.na(df$Cu), 1])
[1] "integer"
# but we get data.frame when using multiple columns...
> class(df[df$Ni %in% 1 & df$Zn %in% 1 & is.na(df$Cu), 1:2])
[1] "data.frame"
I'm just leaving my sub-par answer to mention this alternative idiom, as one one should avoid!
Setup:
> df <- structure(list(Species = 1:4, Ni = c(1, NA, 1, 1), Zn = c(1,
1, 1, 1), Cu = c(NA, NA, 1, NA)), .Names = c("Species", "Ni",
"Zn", "Cu"), row.names = c(NA, -4L), class = "data.frame")
> df
Species Ni Zn Cu
1 1 1 1 NA
2 2 NA 1 NA
3 3 1 1 1
4 4 1 1 NA
Query:
> df[df$Ni == 1 & !is.na(df$Ni)
& df$Zn == 1 & !is.na(df$Zn)
& is.na(!df$Cu), ]
Species Ni Zn Cu
1 1 1 1 NA
4 4 1 1 NA
The trick with NA values is to explicitly exclude them i.e. for example with Ni, request value 1 and !is.na() etc. Failure to do so results in finding records where, say, Ni is NA
As mentioned above the df[df$Ni %in% 1 & df$Zn %in% 1 & is.na(!df$Cu), ]
idiom is much preferable and using subset() is typically better yet.
> df[df$Ni == 1 & df$Zn == 1 & is.na(!df$Cu), ]
Species Ni Zn Cu
1 1 1 1 NA
NA NA NA NA NA # OOPS...
4 4 1 1 NA
Upvotes: 0
Reputation: 7526
Have a look at subset().
x <- structure(list(Species = 1:4, Ni = c(1, NA, 1, 1), Zn = c(1,
1, 1, 1), Cu = c(NA, NA, 1, NA)), .Names = c("Species", "Ni",
"Zn", "Cu"), row.names = c(NA, -4L), class = "data.frame")
subset(x, Ni == 1 & Zn == 1 & is.na(Cu), Species)
Upvotes: 2