maplesyrup123
maplesyrup123

Reputation: 303

Add multiple columns lagged by one year

I need to add a 1-year-lagged version of multiple columns from my dataframe. Here's my data:

data<-data.frame(Year=c("2011","2011","2011","2012","2012","2012","2013","2013","2013"), 
                 Country=c("America","China","India","America","China","India","America","China","India"),
                 Value1=c(234,443,754,334,117,112,987,903,476),
                 Value2=c(2,4,5,6,7,8,1,2,2))

And I want to add two columns that contain Value1 and Value2 at t-1, so that my dataframe looks like this:

enter image description here

How can I do this? Would this be the correct way to lag my variables by year?

Thanks in advance!

Upvotes: 1

Views: 148

Answers (3)

zx8754
zx8754

Reputation: 56189

Using data.table:

library(data.table)

setDT(data)
cols <- grep("^Value", colnames(data), value = TRUE)
data[, paste0(cols, "_lag") := lapply(.SD, shift), .SDcols = cols, by = Country]
#    Year Country Value1 Value2 Value1_lag Value2_lag
# 1: 2011 America    234      2         NA         NA
# 2: 2011   China    443      4         NA         NA
# 3: 2011   India    754      5         NA         NA
# 4: 2012 America    334      6        234          2
# 5: 2012   China    117      7        443          4
# 6: 2012   India    112      8        754          5
# 7: 2013 America    987      1        334          6
# 8: 2013   China    903      2        117          7
# 9: 2013   India    476      2        112          8

Upvotes: 4

Ma&#235;l
Ma&#235;l

Reputation: 52069

In dplyr, use lag by group:

library(dplyr) #1.1.0
data %>% 
  mutate(across(contains("Value"), lag, .names = "{col}_lagged"), .by = Country)

  Year Country Value1 Value2 Value1_lagged Value2_lagged
1 2011 America    234      2            NA            NA
2 2011   China    443      4            NA            NA
3 2011   India    754      5            NA            NA
4 2012 America    334      6           234             2
5 2012   China    117      7           443             4
6 2012   India    112      8           754             5
7 2013 America    987      1           334             6
8 2013   China    903      2           117             7
9 2013   India    476      2           112             8

Below 1.1.0:

data %>% 
  group_by(Country) %>%
  mutate(across(c(GDP, Population), lag, .names = "{col}_lagged")) %>%
  ungroup()

Upvotes: 2

Talha Asif
Talha Asif

Reputation: 401

Another way using dplyr to ge tthe job done.

library(dplyr)
data_lagged <- data %>%
  group_by(Country) %>%
  mutate(Value1_Lagged = lag(Value1),
         Value2_Lagged = lag(Value2),
         Year = as.integer(as.character(Year)) + 1)
data_final <- cbind(data, data_lagged[, c("Value1_Lagged", "Value2_Lagged")])
data_final

Output:

  Year Country Value1 Value2 Value1_Lagged Value2_Lagged
1 2011 America    234      2            NA            NA
2 2011   China    443      4            NA            NA
3 2011   India    754      5            NA            NA
4 2012 America    334      6           234             2
5 2012   China    117      7           443             4
6 2012   India    112      8           754             5
7 2013 America    987      1           334             6
8 2013   China    903      2           117             7
9 2013   India    476      2           112             8

Upvotes: 0

Related Questions