How to filter using ddply

Question

I have data:

df<- data.frame(Plot = rep(rep(1:3,each = 2),3), Year = rep(1:3,each = 6), 
                D = rep(c(1,1,5,NA,2,2,2,1,5),2), HT = rep(c(NA,NA,NA,NA,3,2,NA,4,5),2))


   Plot Year  D HT
1     1    1  1 NA
2     1    1  1 NA
3     2    1  5 NA
4     2    1 NA NA
5     3    1  2  3
6     3    1  2  2
7     1    2  2 NA
8     1    2  1  4
9     2    2  5  5
10    2    2  1 NA
11    3    2  1 NA
12    3    2  5 NA
13    1    3 NA NA
14    1    3  2  3
15    2    3  2  2
16    2    3  2 NA
17    3    3  1  4
18    3    3  5  5

I understand that using ddply() in the plyr package allows me to count how many of each plot-year combo exists

ddply(df, .(df[,"Plot"], df[,"Year"]), nrow)

However, I also want to determine how many rows exist given that HT is not NA (or alternatively if D > 1). I want to append this info as a new column on the output from ddply above.

I thought about just subsetting the DF for the data argument, but doing so would eliminate possible plot-year combinations which would make adjoining columns from the un-subsetted and subsetted outputs more difficult (i.e., using something like merge()).

Is there a way to subset data in ddply using a subset of values in one or more of the data.frame's columns?

Update: Desired Output

   Plot Year Count HaveHt
1     1    1     2     0
2     1    2     2     1
3     1    3     2     1
4     2    1     2     0
5     2    2     2     1
6     2    3     2     1
7     3    1     2     2
8     3    2     2     0
9     3    3     2     2

I understand that this output would be compiled by appending (cbinding) columns from 2 separate variants of the code...

akrun · Accepted Answer

If we are using plyr, then with summarise the column by specifying the summarise and get the sum of logical vector i.e.. non NA elements of HT (!is.na(HT)) after grouping by the columns and also the count with length

plyr::ddply(df, c("Plot", "Year"), summarise, n = length(HT), HaveHt = sum(!is.na(HT)))

The same can be done with dplyr

library(dplyr)
df %>%
    group_by(Plot, Year) %>% 
    summarise(Count = n(), HaveHt = sum(!is.na(HT)))
# A tibble: 9 x 4
# Groups: Plot [?]
#   Plot  Year Count HaveHt
#      
#1     1     1     2      0
#2     1     2     2      1
#3     1     3     2      1
#4     2     1     2      0
#5     2     2     2      1
#6     2     3     2      1
#7     3     1     2      2
#8     3     2     2      0
#9     3     3     2      2

How to filter using ddply

Answers (1)

Related Questions