Jeremy K.
Jeremy K.

Reputation: 1792

Overwrite a specific value in a dataframe, based on matching values

My data is in a format like this:

#>   country year value
#> 1     AUS 2019   100
#> 2     USA 2019   120
#> 3     AUS 2018    90
df <- data.frame(stringsAsFactors=FALSE,
     country = c("AUS", "USA", "AUS"),
        year = c(2019, 2019, 2018),
       value = c(100, 120, 90)
)

and I have an one row dataframe that represents a revision that should overwrite the existing record in my data.

#>   country year value
#> 1     AUS 2019   500
df2 <- data.frame(stringsAsFactors=FALSE,
                  country = c("AUS"),
                     year = c(2018),
                    value = c(500)
             )

My desired output is:

#>   country year value
#> 1     AUS 2019   100
#> 2     USA 2019   120
#> 3     AUS 2018   500

I know how to find the row to overwrite:

library(tidyverse)
df %>% filter(country == overwrite$country & year == overwrite$year) %>% 
  mutate(value = overwrite$value)

but how do I put that back in the original dataframe?

Tidyverse answers are easier for me to work with, but I'm open to any solutions.

Upvotes: 3

Views: 376

Answers (3)

Matt
Matt

Reputation: 2987

Using mutate and if_else:

library(tidyverse)

df %>% 
mutate(value = if_else(country %in% df2$country & year %in% df2$year, df2$value, value))

Results in:

country year value
1     AUS 2019   100
2     USA 2019   120
3     AUS 2018   500

Upvotes: 1

Joris C.
Joris C.

Reputation: 6234

One possible tidyverse approach using 1). anti_join to remove the rows from df that will be replaced and 2). bind_rows to add the replacement rows from df2:

library(dplyr)

anti_join(df, df2, by = c("country", "year")) %>% bind_rows(df2)
#>   country year value
#> 1     AUS 2019   100
#> 2     USA 2019   120
#> 3     AUS 2018   500

Or, another one using 1). right_join to join the old and new values and 2). coalesce to keep only the new values:

right_join(df2, df, by = c("country", "year")) %>%
    transmute(country, year, value = coalesce(value.x, value.y))
#>   country year value
#> 1     AUS 2019   100
#> 2     USA 2019   120
#> 3     AUS 2018   500

Upvotes: 1

akrun
akrun

Reputation: 887108

Here, an efficient approach is join on with data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), join on with the 'df2' on 'country', 'year' assign (:=) the 'value' column from second dataset (i.value) to replace the 'value' in original dataset

library(data.table)
setDT(df)[df2, value := i.value, on = .(country, year)]
df
#    country year value
#1:     AUS 2019   100
#2:     USA 2019   120
#3:     AUS 2018   500

Upvotes: 1

Related Questions