How to code a factor variable when a value lies between two other factors either with a new column or by adding levels?

Question

I have the following df:

    id    time              x     y      pickup_dropoff
    1    2/1/2013 12:23    73    40       pickup
    1    2/1/2013 12:25    73    40.2     ping
    1    2/1/2013 12.27    73    40.5     ping
    1    2/1/2013 12:34    73    41       dropoff
    1    2/1/2013 12:35    73    41.4     ping
    1    1/1/2013 12:45   73.6   41       pickup
    1    1/1/2013 12:57   73.5   41       dropoff
    2    1/2/2013 12:54   73.6   42       ping   
    2    1/2/2013 13:00   73.45  42       pickup
    2    1/2/2013 14:00   73     42       dropoff
    2    1/2/2013 14:50   73.11  41       pickup
    2    1/2/2013 15:30   73     44       dropoff
    2    1/2/2013 16:00   73.1   41       pickup
    2    1/2/2013 18:00    74    42       dropoff

Thanks to the help I received in this post: Reshape Data partially from Wide to Long in R

I was able reshape the data to resemble the above. I'm looking now to recode some of the factor values to show when a vehicle is in use or is cruising without being in use, This new variable would make the following assumptions:

if a ping is between a pickup and a dropoff the vehicle is in use
if a ping is between a dropoff and a pickup its out of use

I'd like the output to look like the following:

        id    time              x     y      pickup_dropoff     status
         1    2/1/2013 12:23    73    40       pickup           pickup
         1    2/1/2013 12:25    73    40.2     ping              inuse      
         1    2/1/2013 12.27    73    40.5     ping              inuse
         1    2/1/2013 12:34    73    41       dropoff           dropoff
         1    2/1/2013 12:35    73    41.4     ping              nouse
         1    1/1/2013 12:45   73.6   41       pickup            pickup
         1    1/1/2013 12:57   73.5   41       dropoff           dropoff
         2    1/2/2013 12:54   73.6   42       ping              unknown
         2    1/2/2013 13:00   73.45  42       pickup            pickup 
         2    1/2/2013 14:00   73     42       dropoff           dropoff
         2    1/2/2013 14:50   73.11  41       pickup            pickup
         2    1/2/2013 15:30   73     44       dropoff           dropoff
         2    1/2/2013 16:00   73.1   41       pickup            pickup 
         2    1/2/2013 18:00    74    42       dropoff           dropoff

I currently have pickup_dropoff coded as a factor with 3 levels.

One solution I am playing with is adding a column with the factor levels of 1, 2, 3, then using as.numeric to turn them into numericals and then writing a couple of if statements like the following:

            df$status = ifelse(df$pickup_dropoff LAYS BETWEEN 3
            and 1, df$pickup_dropoff == "inuse", df$pickup_dropoff)

I may be overthinking this, but I'm not sure if there is a way to say "in between" in R. Also I have to deal with another dimension "id" since I don't want a ping between two different ids to be considered in use. In any case it would be considered "unknown" as the data I am working with is incomplete.

Any help is appreciated. Thanks!

How to code a factor variable when a value lies between two other factors either with a new column or by adding levels?

Answers (1)

Related Questions