Reputation: 57
I am trying to create a column in a data frame containing how many values were used in the mean function for each line.
First, I had a data frame df like this:
df <- data.frame(tree_id=rep(c("CHC01", "CHC02"),each=8),
rad=(c(rep("A", 4),rep("B", 4), rep("A", 4),
rep("C", 4))), year=rep(2015:2018, 4),
growth= c(NA, NA, 1.2, 3.2, 2.1, 1.5, 2.3, 2.7, NA, NA, NA, 1.7, 3.5, 1.4, 2.3, 2.7))
Then, I created a new data frame called avg_df, containing only the mean values of growth grouped by tree_id and year
library(dplyr)
avg_df <- df%>%
group_by(tree_id, year, add=TRUE)%>%
summarise(avg_growth=mean(growth, na.rm = TRUE))
Now, I would like to add a new column in avg_df, containing how much values I used for calculating the mean growth for each tree_id and year, ignoring the NA.
Example: for CHC01 in 2015, the result is 1, because it was the average of 2.1 and NA and
for CHC01 in 2018, it will be 2, because the result is the average of 3.2 and 2.7
Here is the expected output:
avg_df$radii <- c(1,1,2,2,1,1,1,2)
tree_id year avg_growth radii
CHC01 2015 2.1 1
CHC01 2016 1.5 1
CHC01 2017 1.75 2
CHC01 2018 2.95 2
CHC02 2015 3.5 1
CHC02 2016 1.4 1
CHC02 2017 2.3 1
CHC02 2018 2.2 2
*In my real data, the values in radii will vary from 1 to 4.
Could anyone help me with this?
Thank you very much!
Upvotes: 2
Views: 403
Reputation: 887511
We can get the sum
of non-NA elements (!is.na(growth)
) after grouping by 'tree_id' and 'year'
library(dplyr)
df %>%
group_by(tree_id, year) %>%
summarise(avg_growth=mean(growth, na.rm = TRUE),
radii = sum(!is.na(growth)))
# A tibble: 8 x 4
# Groups: tree_id [2]
# tree_id year avg_growth radii
# <fct> <int> <dbl> <int>
#1 CHC01 2015 2.1 1
#2 CHC01 2016 1.5 1
#3 CHC01 2017 1.75 2
#4 CHC01 2018 2.95 2
#5 CHC02 2015 3.5 1
#6 CHC02 2016 1.4 1
#7 CHC02 2017 2.3 1
#8 CHC02 2018 2.2 2
Or using data.table
library(data.table)
setDT(df)[, .(avg_growth = mean(growth, na.rm = TRUE),
radii = sum(!is.na(growth))), by = .(tree_id, year)]
Upvotes: 1