Reputation: 1792
My data is in a format like this:
#> country year value
#> 1 AUS 2019 100
#> 2 USA 2019 120
#> 3 AUS 2018 90
df <- data.frame(stringsAsFactors=FALSE,
country = c("AUS", "USA", "AUS"),
year = c(2019, 2019, 2018),
value = c(100, 120, 90)
)
and I have an one row dataframe that represents a revision that should overwrite the existing record in my data.
#> country year value
#> 1 AUS 2019 500
df2 <- data.frame(stringsAsFactors=FALSE,
country = c("AUS"),
year = c(2018),
value = c(500)
)
My desired output is:
#> country year value
#> 1 AUS 2019 100
#> 2 USA 2019 120
#> 3 AUS 2018 500
I know how to find the row to overwrite:
library(tidyverse)
df %>% filter(country == overwrite$country & year == overwrite$year) %>%
mutate(value = overwrite$value)
but how do I put that back in the original dataframe?
Tidyverse answers are easier for me to work with, but I'm open to any solutions.
Upvotes: 3
Views: 376
Reputation: 2987
Using mutate
and if_else
:
library(tidyverse)
df %>%
mutate(value = if_else(country %in% df2$country & year %in% df2$year, df2$value, value))
Results in:
country year value
1 AUS 2019 100
2 USA 2019 120
3 AUS 2018 500
Upvotes: 1
Reputation: 6234
One possible tidyverse approach using 1). anti_join
to remove the rows from df
that will be replaced and 2). bind_rows
to add the replacement rows from df2
:
library(dplyr)
anti_join(df, df2, by = c("country", "year")) %>% bind_rows(df2)
#> country year value
#> 1 AUS 2019 100
#> 2 USA 2019 120
#> 3 AUS 2018 500
Or, another one using 1). right_join
to join the old and new values and 2). coalesce
to keep only the new values:
right_join(df2, df, by = c("country", "year")) %>%
transmute(country, year, value = coalesce(value.x, value.y))
#> country year value
#> 1 AUS 2019 100
#> 2 USA 2019 120
#> 3 AUS 2018 500
Upvotes: 1
Reputation: 887108
Here, an efficient approach is join on
with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), join on
with the 'df2' on 'country', 'year' assign (:=
) the 'value' column from second dataset (i.value
) to replace the 'value' in original dataset
library(data.table)
setDT(df)[df2, value := i.value, on = .(country, year)]
df
# country year value
#1: AUS 2019 100
#2: USA 2019 120
#3: AUS 2018 500
Upvotes: 1