tai-sik
tai-sik

Reputation: 142

How to populate a new variable based on a condition of two other variables

I would like to create a variable that takes a certain value in a certain year, by country.

Country Year Price Price_2018
   A    2016  1      4
   A    2017  3      4
   A    2018  4      4
   B    2016  1      5
   B    2017  7      5
   B    2018  5      5
   C    2016  1      3
   C    2017  6      3
   C    2018  3      3

As above, I'd like to create the variable Price_2018 that, for each country, take the price in 2018 and populates each observation with this value. Would anyone be able to help here? Many thanks in advance.

Upvotes: 0

Views: 68

Answers (2)

akrun
akrun

Reputation: 887231

We can either use == (assuming there are only unique 'Year' for each 'Country')

library(dplyr)
df1 %>%
    group_by(Country) %>%
    mutate(Price_2018 = Price[Year == 2018])

-output

# A tibble: 9 x 4
# Groups:   Country [3]
#  Country  Year Price Price_2018
#  <chr>   <int> <int>      <int>
#1 A        2016     1          4
#2 A        2017     3          4
#3 A        2018     4          4
#4 B        2016     1          5
#5 B        2017     7          5
#6 B        2018     5          5
#7 C        2016     1          3
#8 C        2017     6          3
#9 C        2018     3          3

Or match

df1 %>%
     group_by(Country) %>%
     mutate(Price_2018 = Price[match(2018, Year)])

If we need to create multiple 'Year' column, an easier option is map

library(purrr)
map_dfc(unique(df1$Year), ~ df1 %>%  group_by(Country) %>%
    transmute(!! str_c('Price_', .x) :=  Price[Year == .x]) %>%
    ungroup %>%
    select(-Country)) %>%
    mutate(df1, .)
#  Country Year Price Price_2016 Price_2017 Price_2018
#1       A 2016     1          1          3          4
#2       A 2017     3          1          3          4
#3       A 2018     4          1          3          4
#4       B 2016     1          1          7          5
#5       B 2017     7          1          7          5
#6       B 2018     5          1          7          5
#7       C 2016     1          1          6          3
#8       C 2017     6          1          6          3
#9       C 2018     3          1          6          3

data

df1 <- structure(list(Country = c("A", "A", "A", "B", "B", "B", "C", 
"C", "C"), Year = c(2016L, 2017L, 2018L, 2016L, 2017L, 2018L, 
2016L, 2017L, 2018L), Price = c(1L, 3L, 4L, 1L, 7L, 5L, 1L, 6L, 
3L)), row.names = c(NA, -9L), class = "data.frame")

Upvotes: 1

Duck
Duck

Reputation: 39605

It looks like you want to create a variable for each year, so you can try this approach reshaping data and then merging using tidyverse functions:

library(tidyverse)
#Code
newdf <- df %>% left_join(df %>%
  mutate(Year=paste0('Price_',Year)) %>%
  pivot_wider(names_from = Year,values_from=Price))

Output:

  Country Year Price Price_2016 Price_2017 Price_2018
1       A 2016     1          1          3          4
2       A 2017     3          1          3          4
3       A 2018     4          1          3          4
4       B 2016     1          1          7          5
5       B 2017     7          1          7          5
6       B 2018     5          1          7          5
7       C 2016     1          1          6          3
8       C 2017     6          1          6          3
9       C 2018     3          1          6          3

And if you only want 2018:

#Code 2
newdf <- df %>% left_join(df %>% filter(Year==2018) %>%
  mutate(Year=paste0('Price_',Year)) %>%
  pivot_wider(names_from = Year,values_from=Price))

Output:

  Country Year Price Price_2018
1       A 2016     1          4
2       A 2017     3          4
3       A 2018     4          4
4       B 2016     1          5
5       B 2017     7          5
6       B 2018     5          5
7       C 2016     1          3
8       C 2017     6          3
9       C 2018     3          3

Some data used:

#Data
df <- structure(list(Country = c("A", "A", "A", "B", "B", "B", "C", 
"C", "C"), Year = c(2016L, 2017L, 2018L, 2016L, 2017L, 2018L, 
2016L, 2017L, 2018L), Price = c(1L, 3L, 4L, 1L, 7L, 5L, 1L, 6L, 
3L)), row.names = c(NA, -9L), class = "data.frame")

Upvotes: 2

Related Questions