Reputation: 111
I have a dataframe (df1) that looks something like this:
id year sex age value
222 2016 M 5 123
222 2017 F 10 555
224 2015 F 10 212
225 2015 M 25 214
222 2016 M 60 111
224 2016 M 10 642
and so on ( around 20,000 rows). One of the years for two specific ids and just "M" has an error in the data so for now I calculated the mean values from other years to use for the time being. It would look something like this (df2):
id year sex age mean
222 2016 M 10 123
222 2016 M 5 555
224 2016 M 60 212
224 2016 M 70 214
I want to take these values from df2 and just replace the ones on df1 that have id 222,224 year 2016 and ages which are M. Would the simplest way to do this be, getting rid of these incorrect rows on df1 (get rid of males from 2016 for those two ids and ages) and then rbinding the correct dataframes? Sounds easy but I am a bit iffy about potentially getting rid of the wrong ones. Thank you.
Upvotes: 1
Views: 189
Reputation: 2462
If your dataframe df2
only contains the id-year-sex-age combinations that you want to use (i.e. doesn't contain any other combinations), you can use
library(dplyr)
left_join(df1, df2, by = c("id", "year", "sex", "age")) %>%
mutate(value = if_else(!is.na(mean), mean, value))
Upvotes: 2