How to count how many values were used in a mean() function?

Question

I am trying to create a column in a data frame containing how many values were used in the mean function for each line.

First, I had a data frame df like this:

df <- data.frame(tree_id=rep(c("CHC01", "CHC02"),each=8), 
                 rad=(c(rep("A", 4),rep("B", 4), rep("A", 4), 
                 rep("C", 4))), year=rep(2015:2018, 4), 
                 growth= c(NA, NA, 1.2, 3.2, 2.1, 1.5, 2.3, 2.7, NA, NA, NA, 1.7, 3.5, 1.4, 2.3, 2.7))

Then, I created a new data frame called avg_df, containing only the mean values of growth grouped by tree_id and year

library(dplyr)

avg_df <- df%>%
  group_by(tree_id, year, add=TRUE)%>%
  summarise(avg_growth=mean(growth, na.rm = TRUE))

Now, I would like to add a new column in avg_df, containing how much values I used for calculating the mean growth for each tree_id and year, ignoring the NA.

Example: for CHC01 in 2015, the result is 1, because it was the average of 2.1 and NA and

for CHC01 in 2018, it will be 2, because the result is the average of 3.2 and 2.7

Here is the expected output:

avg_df$radii <- c(1,1,2,2,1,1,1,2)

tree_id  year avg_growth radii

CHC01    2015       2.1      1
CHC01    2016       1.5      1
CHC01    2017       1.75     2
CHC01    2018       2.95     2
CHC02    2015       3.5      1
CHC02    2016       1.4      1
CHC02    2017       2.3      1
CHC02    2018       2.2      2

*In my real data, the values in radii will vary from 1 to 4.

Could anyone help me with this?

Thank you very much!

akrun · Accepted Answer

We can get the sum of non-NA elements (!is.na(growth)) after grouping by 'tree_id' and 'year'

library(dplyr)
df %>%
    group_by(tree_id, year) %>% 
    summarise(avg_growth=mean(growth, na.rm = TRUE), 
              radii = sum(!is.na(growth)))
# A tibble: 8 x 4
# Groups:   tree_id [2]
#  tree_id  year avg_growth radii
#            
#1 CHC01    2015       2.1      1
#2 CHC01    2016       1.5      1
#3 CHC01    2017       1.75     2
#4 CHC01    2018       2.95     2
#5 CHC02    2015       3.5      1
#6 CHC02    2016       1.4      1
#7 CHC02    2017       2.3      1
#8 CHC02    2018       2.2      2

Or using data.table

library(data.table)
setDT(df)[, .(avg_growth = mean(growth, na.rm = TRUE), 
              radii = sum(!is.na(growth))), by = .(tree_id, year)]

How to count how many values were used in a mean() function?

Answers (1)

Related Questions