Reputation: 2127
I have the following df:
id time x y pickup_dropoff
1 2/1/2013 12:23 73 40 pickup
1 2/1/2013 12:25 73 40.2 ping
1 2/1/2013 12.27 73 40.5 ping
1 2/1/2013 12:34 73 41 dropoff
1 2/1/2013 12:35 73 41.4 ping
1 1/1/2013 12:45 73.6 41 pickup
1 1/1/2013 12:57 73.5 41 dropoff
2 1/2/2013 12:54 73.6 42 ping
2 1/2/2013 13:00 73.45 42 pickup
2 1/2/2013 14:00 73 42 dropoff
2 1/2/2013 14:50 73.11 41 pickup
2 1/2/2013 15:30 73 44 dropoff
2 1/2/2013 16:00 73.1 41 pickup
2 1/2/2013 18:00 74 42 dropoff
Thanks to the help I received in this post: Reshape Data partially from Wide to Long in R
I was able reshape the data to resemble the above. I'm looking now to recode some of the factor values to show when a vehicle is in use or is cruising without being in use, This new variable would make the following assumptions:
I'd like the output to look like the following:
id time x y pickup_dropoff status
1 2/1/2013 12:23 73 40 pickup pickup
1 2/1/2013 12:25 73 40.2 ping inuse
1 2/1/2013 12.27 73 40.5 ping inuse
1 2/1/2013 12:34 73 41 dropoff dropoff
1 2/1/2013 12:35 73 41.4 ping nouse
1 1/1/2013 12:45 73.6 41 pickup pickup
1 1/1/2013 12:57 73.5 41 dropoff dropoff
2 1/2/2013 12:54 73.6 42 ping unknown
2 1/2/2013 13:00 73.45 42 pickup pickup
2 1/2/2013 14:00 73 42 dropoff dropoff
2 1/2/2013 14:50 73.11 41 pickup pickup
2 1/2/2013 15:30 73 44 dropoff dropoff
2 1/2/2013 16:00 73.1 41 pickup pickup
2 1/2/2013 18:00 74 42 dropoff dropoff
I currently have pickup_dropoff coded as a factor with 3 levels.
One solution I am playing with is adding a column with the factor levels of 1, 2, 3, then using as.numeric to turn them into numericals and then writing a couple of if statements like the following:
df$status = ifelse(df$pickup_dropoff LAYS BETWEEN 3
and 1, df$pickup_dropoff == "inuse", df$pickup_dropoff)
I may be overthinking this, but I'm not sure if there is a way to say "in between" in R. Also I have to deal with another dimension "id" since I don't want a ping between two different ids to be considered in use. In any case it would be considered "unknown" as the data I am working with is incomplete.
Any help is appreciated. Thanks!
Upvotes: 2
Views: 165
Reputation: 145965
I think this will work
library(dplyr)
df %>% mutate(
status = ifelse(pickup_dropoff == "pickup", "inuse",
ifelse(pickup_dropoff == "dropoff", "nouse", NA))
) %>%
group_by(id) %>%
mutate(status = zoo::na.locf(status, na.rm = F),
status = ifelse(pickup_dropoff %in% c("pickup", "dropoff"), pickup_dropoff, status),
status = ifelse(is.na(status), "unknown", status))
First will put in the values for pickup and dropoff that we want the new column to take after pickup and dropoff, leaving everything else as NA
. Then we fill in the missing values using zoo::na.locf
(grouped by ID). Lastly, we reset the values at pickup and dropoff to what we actually want.
This creates a character vector - you can of course stick a factor conversion at the end.
Using plyr
or base
instead of dplyr
:
df$status = with(df, ifelse(pickup_dropoff == "pickup", "inuse",
ifelse(pickup_dropoff == "dropoff", "nouse", NA))
## pick one
# base
df$status = ave(df$status, df$id, FUN = function(x) zoo::na.locf(x, na.rm = F))
# plyr
df = plyr::ddply(df, "id", plyr::mutate, status = zoo::na.locf(status, na.rm = F))
df$status = with(df, ifelse(pickup_dropoff %in% c("pickup", "dropoff"), pickup_dropoff, status))
df$status = with(df, ifelse(is.na(status), "unknown", status))
Upvotes: 2