Reputation: 25
I have an assignment at the moment including a table of data that includes information about observations of species of animals being measured on different occasions. In the 'weight' column of my data there are missing values that I'm supposed to replace with the mean weight for the species the animal comes from. Therefore I would want the mean weight for the species "albigula" which is 148 to replace NA in two cases where the animals weight was not recorded, so that I have a complete data set. I then need to repeat this process for another 10 or so species.
I cannot think of a way to do this apart from the following :
albigula <- filter(surveys_combined_year, surveys_combined_year$species == "albigula")
albigula$weight %>% mean(na.rm= TRUE)
However, this obviously doesn't work as I cannot impute the mean value into it's specific spot in "surveys_combined_year$weight".
Sorry for the likely super beginner question, I've searched all the resources we've been given in class and I still can't seem to understand what I'm missing.
Help me please!
Upvotes: 1
Views: 37
Reputation: 887048
We can do a group_by
replace
. Grouped by 'species', replace
the NA
(replace_na
) elements in 'weight' by the mean
of 'weight'
library(dplyr)
library(tidyr)
out <- surveys_combined_year %>%
group_by(species) %>%
mutate(weight = replace_na(weight, mean(weight, na.rm = TRUE)))
EDIT - changed replace
to replace_na
(comments from @BenBolker)
Upvotes: 4