Aggregating data issue

Question

Dataset:

# Groups:   SITEPLOT [21,143]
   L4_KEY              HEMIAB SITEPLOT
                       
 1 82g  Downeast Coast      0 ACAD 1  
 2 82g  Downeast Coast      0 ACAD 100
 3 82g  Downeast Coast      0 ACAD 101
 4 82g  Downeast Coast      0 ACAD 102
 5 82g  Downeast Coast      0 ACAD 103
 6 82g  Downeast Coast      0 ACAD 104
 7 82g  Downeast Coast      0 ACAD 105
 8 82g  Downeast Coast      0 ACAD 107
 9 82g  Downeast Coast      0 ACAD 108
10 82g  Downeast Coast      0 ACAD 109
# ... with 21,133 more rows

HEMIAB indicates the abundance of a certain species. This dataset is on a plot level basis. I want to know how many distinct L4_KEYs have no abundance of this species, i.e. those which do not have any plots that have HEMIAB >0. Since this dataset is not on the L4_KEY level, I'm having a really hard time figuring ou what seems like a simple solution. Any help would be great. I have tried various dplyr and aggregate solutions but can't get it to do it based on L4_KEY, not plot level. For some reason I'm having issues with this

akrun · Accepted Answer

We can subset the 'L4_KEY' column based on the values of 'HEMLAB' i.e. those having 'HEMLAB' greater than 0, get the unique elements, use setdiff from the levels of the 'L4_KEY', and get the length of those doesn't have any value greater than 0 (in base R)

length(setdiff(levels(df1$L4_KEY), unique(df1$L4_KEY[df1$HEMLAB > 0])))

Another option is to group by 'L4_KEY', filter those having all values in 'HEMLAB' less than or equal to 0, ungroup and get the distinct elements

library(dplyr)
out <- df1 %>%
   group_by(L4_KEY) %>%
   filter(all(HEMLAB <=0)) %>%
   ungroup %>%
   distinct(L4_KEY) %>%
   droplevels()

Aggregating data issue

Answers (1)

Related Questions