Replacing incorrect values in R with correct ones from another dataframe

Question

I have a dataframe (df1) that looks something like this:

id   year  sex  age value
222  2016   M    5   123
222  2017   F    10  555
224  2015   F    10  212
225  2015   M    25  214
222  2016   M    60  111
224  2016   M    10  642

and so on ( around 20,000 rows). One of the years for two specific ids and just "M" has an error in the data so for now I calculated the mean values from other years to use for the time being. It would look something like this (df2):

id   year  sex  age  mean
222  2016   M    10  123
222  2016   M     5  555
224  2016   M    60  212
224  2016   M    70  214

I want to take these values from df2 and just replace the ones on df1 that have id 222,224 year 2016 and ages which are M. Would the simplest way to do this be, getting rid of these incorrect rows on df1 (get rid of males from 2016 for those two ids and ages) and then rbinding the correct dataframes? Sounds easy but I am a bit iffy about potentially getting rid of the wrong ones. Thank you.

Andrea M · Accepted Answer

If your dataframe df2 only contains the id-year-sex-age combinations that you want to use (i.e. doesn't contain any other combinations), you can use

library(dplyr)
left_join(df1, df2, by = c("id", "year", "sex", "age")) %>% 
  mutate(value = if_else(!is.na(mean), mean, value))

Replacing incorrect values in R with correct ones from another dataframe

Answers (1)

Related Questions