Reputation: 4102
I have a dataframe containing locations (loc
), where each location has two zones (type
) and area of disturbances in each zone (area
). From my dataframe, I would like to keep locations that have disturbances in both zones.
I thought I can simply do this by subsetting data by two conditions, however using AND condition (&)
subset(dd, (dd$type == "npr" & dd$area > 0 ) & (dd$type == "buff" & dd$area > 0 ) )
gives me null output
[1] loc type area
<0 rows> (or 0-length row.names)
and using "OR" condition (|)
subset(dd, (dd$type == "npr" & dd$area > 0 ) | (dd$type == "buff" & dd$area > 0 ) )
is not what I want..
loc type area
1 a npr 10
2 a buff 20
4 b buff 10
5 c npr 5
7 d npr 5
8 d buff 5
How to correctly subset locations which have disturbance area > 0 in both zone types?
My dummy data:
loc<-c("a", "a", "b", "b", "c", "c", "d", "d")
type= rep(c("npr", "buff"), 4)
area = c(10,20,0,10,5,0,5,5)
dd<-data.frame(loc, type, area)
Desired output:
loc type area
1 a npr 10
2 a buff 20
3 d npr 5
4 d buff 5
Upvotes: 2
Views: 6021
Reputation: 5456
I would reshape first:
loc<-c("a", "a", "b", "b", "c", "c", "d", "d")
type= rep(c("npr", "buff"), 4)
area = c(10,20,0,10,5,0,5,5)
dd<-data.frame(loc, type, area)
library(reshape2)
dd_wide <- dcast(dd, loc ~ type, value.var = "area")
mask <- subset(dd_wide, buff > 0 & npr > 0)[["loc"]]
subset(dd, loc %in% mask)
# one line:
subset(dd, loc %in% subset(dcast(dd, loc ~ type, value.var = "area"), buff > 0 & npr > 0)[["loc"]])
Upvotes: 2
Reputation: 11908
You'll need to compute a grouped summary to achieve this. That is, you want
to find out for each loc
if all of the area
s in that location are > 0.
I have always found base R a bit awkward for grouped statistics, but here's
one way to achieve that.
First, use tapply()
to determine for each loc
whether it should be
included or not:
(include <- tapply(dd$area, dd$loc, function(x) all(x > 0)))
#> a b c d
#> TRUE FALSE FALSE TRUE
Then we can use loc
values to index that result to get a vector suitable
to subset dd
with:
include[dd$loc]
#> a a b b c c d d
#> TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE
dd[include[dd$loc], ]
#> loc type area
#> 1 a npr 10
#> 2 a buff 20
#> 7 d npr 5
#> 8 d buff 5
We can also put these steps together inside a subset()
call to avoid
creating extra variables:
subset(dd, tapply(area, loc, function(x) all(x > 0))[loc])
#> loc type area
#> 1 a npr 10
#> 2 a buff 20
#> 7 d npr 5
#> 8 d buff 5
Alternatively, you could use dplyr:
library(dplyr)
dd %>%
group_by(loc) %>%
filter(all(area > 0))
#> # A tibble: 4 x 3
#> # Groups: loc [2]
#> loc type area
#> <fct> <fct> <dbl>
#> 1 a npr 10
#> 2 a buff 20
#> 3 d npr 5
#> 4 d buff 5
Created on 2018-07-25 by the reprex package (v0.2.0.9000).
Upvotes: 2