Reputation: 580
I have data on work stations were workers worked by day, and I need to find how many days a worker began working in the same station he left off the period day. Each observation is one work-day per worker.
worker.id | start.station | end.station | day
1 | 234 | 342 | 2015-01-02
1 | 342 | 425 | 2015-01-03
1 | 235 | 621 | 2015-01-04
2 | 155 | 732 | 2015-01-02
2 | 318 | 632 | 2015-01-03
2 | 632 | 422 | 2015-01-04
So the desired outcomes would be to generate a variable (same) that identifies days in which worker started at same work station as he left off previous day (with NA
or FALSE
in first observation for each worker).
worker.id | start.station | end.station | day | same
1 | 234 | 342 | 2015-01-02 | FALSE
1 | 342 | 425 | 2015-01-03 | TRUE
1 | 235 | 621 | 2015-01-04 | FALSE
2 | 155 | 732 | 2015-01-02 | FALSE
2 | 318 | 632 | 2015-01-03 | FALSE
2 | 632 | 422 | 2015-01-04 | TRUE
I think something using dplyr
would work, but not sure what.
Thanks!
Upvotes: 0
Views: 322
Reputation: 5193
worker.id<-c(1,1,1,2,2,2)
start.station<-c(234,342,235,155,218,632)
end.station<-c(342,425,621,732,632,422)
end.station<-c(342,425,621,732,632,422)
day<-c("2015-01-02"," 2015-01-03"," 2015-01-04"," 2015-01-02"," 2015-01-03"," 2015-01-04")
df<-data.frame(worker.id, start.station ,end.station, day)
worker.id start.station end.station day
1 1 234 342 2015-01-02
2 1 342 425 2015-01-03
3 1 235 621 2015-01-04
4 2 155 732 2015-01-02
5 2 218 632 2015-01-03
6 2 632 422 2015-01-04
df$same<-ifelse(df$start.station!=lag(df$end.station) |
df$day=="2015-01-02", "FALSE","TRUE")
worker.id start.station end.station day same
1 1 234 342 2015-01-02 FALSE
2 1 342 425 2015-01-03 TRUE
3 1 235 621 2015-01-04 FALSE
4 2 155 732 2015-01-02 FALSE
5 2 218 632 2015-01-03 FALSE
6 2 632 422 2015-01-04 TRUE
Per suggestions in comments below if you want to group by worker ID but use ifelse (clunky):
df <-df %>%
group_by(worker.id) %>%
mutate(same=ifelse(start.station!=lag(end.station) &
start.station!=NA, "FALSE","TRUE")) %>%
mutate(same=ifelse(is.na(same), "FALSE","TRUE"))
as.data.frame(df)
worker.id start.station end.station day same
1 1 234 342 2015-01-02 FALSE
2 1 342 425 2015-01-03 TRUE
3 1 235 621 2015-01-04 FALSE
4 2 155 732 2015-01-02 FALSE
5 2 218 632 2015-01-03 FALSE
6 2 632 422 2015-01-04 TRUE
Upvotes: 3