Jams
Jams

Reputation: 25

How to impute means into specific observations in a column?

I have an assignment at the moment including a table of data that includes information about observations of species of animals being measured on different occasions. In the 'weight' column of my data there are missing values that I'm supposed to replace with the mean weight for the species the animal comes from. Therefore I would want the mean weight for the species "albigula" which is 148 to replace NA in two cases where the animals weight was not recorded, so that I have a complete data set. I then need to repeat this process for another 10 or so species.

I cannot think of a way to do this apart from the following :

    albigula <- filter(surveys_combined_year, surveys_combined_year$species == "albigula")
    albigula$weight %>% mean(na.rm= TRUE)

However, this obviously doesn't work as I cannot impute the mean value into it's specific spot in "surveys_combined_year$weight".

Sorry for the likely super beginner question, I've searched all the resources we've been given in class and I still can't seem to understand what I'm missing.

Help me please!

Upvotes: 1

Views: 37

Answers (1)

akrun
akrun

Reputation: 887048

We can do a group_by replace. Grouped by 'species', replace the NA (replace_na) elements in 'weight' by the mean of 'weight'

library(dplyr)
library(tidyr)
out <- surveys_combined_year %>%
         group_by(species) %>%
         mutate(weight = replace_na(weight, mean(weight, na.rm = TRUE)))

EDIT - changed replace to replace_na (comments from @BenBolker)

Upvotes: 4

Related Questions