Reputation: 1321
I have a dataset with survey score results for 3 hospitals over a number of years. This survey contains 2 questions.
The dataset looks like this -
set.seed(1234)
library(dplyr)
library(tidyr)
dataset= data.frame(Hospital=c(rep('A',10),rep('B',8),rep('C',6)),
YearN=c(2015,2016,2017,2018,2019,
2015,2016,2017,2018,2019,
2015,2016,2017,2018,
2015,2016,2017,2018,
2015,2016,2017,
2015,2016,2017),
Question=c(rep('Overall Satisfaction',5),
rep('Overall Cleanliness',5),
rep('Overall Satisfaction',4),
rep('Overall Cleanliness',4),
rep('Overall Satisfaction',3),
rep('Overall Cleanliness',3)),
ScoreYearN=c(rep(runif(24,min = 0.6,max = 1))),
TotalYearN=c(rep(round(runif(24,min = 1000,max = 5000),0))))
MY OBJECTIVE
To add two columns to the dataset such that -
MY ATTEMPT
I called the first column ScoreYearN-1
and the second column TotalYearN-1
I used the lag
function to create the new columns that contain the lagged values from the existing columns.
library(dplyr)
library(tidyr)
dataset$`ScoreYearN-1`=lag(dataset$ScoreYearN)
dataset$`TotalYearN-1`=lag(dataset$TotalYearN)
Which gives me a resulting dataset where I have the desired outcome for the first five rows only (these rows correspond to the first Hospital-Question combination).
The remaining rows do not account for this grouping, and hence the 2015 'N-1' values take on the values of the previous group.
I'm not sure this is the best way to go about this problem. If you have any better suggestions, I'm happy to consider them.
Any help will be greatly appreciated.
Upvotes: 1
Views: 1050
Reputation: 184
You're close! Just use dplyr
to group by hospital
dataset_lagged <- dataset %>%
group_by(Hospital,Question) %>%
mutate(`ScoreYearN-1` = lag(ScoreYearN),
`TotalYearN-1` = lag(TotalYearN))
Upvotes: 3