Reputation: 3
I have a table where is user id and the duration of the trip in hours.
data.frame(
check.names = FALSE,
user_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
`trip.(h)` = c(0, 0.02, 170, 0.07, 12, 3, 90, 550, 1)
)
user_id | trip (h) |
---|---|
1 | 0 |
1 | 0.02 |
1 | 170 |
1 | 0.07 |
1 | 12 |
1 | 3 |
1 | 90 |
1 | 550 |
1 | 1 |
... | ... |
Now I would like to add a column with a trip id for each person. Trip id should change for each person (unique user_id ) if any value in column trip is longer than 155 hours. I would do it with a loop, but the dataframe contains 12 columns and 5.7m rows so it would be highly ineffective. Is it possible to do it columnwise with dplyr? Since I have multiple user_id, I would firstly group by user id and then maybe do it with mutate()? Maybe with fill()?
Also thought of counting >155h trips for each person and then assigning 1:n() elements to each trip, but it also seems a little far fetched
This is the desired output:
data.frame(
check.names = FALSE,
User.id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
`trip.(h)` = c(0, 0.02, 170, 0.07, 12, 3, 90, 550, 1),
trip_id = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L)
)
User id | trip (h) | trip_id |
---|---|---|
1 | 0 | 1 |
1 | 0.02 | 1 |
1 | 170 | 2 |
1 | 0.07 | 2 |
1 | 12 | 2 |
1 | 3 | 2 |
1 | 90 | 2 |
1 | 550 | 3 |
1 | 1 | 3 |
... | ... | ... |
Upvotes: 0
Views: 1106
Reputation: 1156
trip (h) will be read as trip..h. in R
your_data %>%
mutate(flag_new_trip = as.integer(trip..h. > 155)) %>%
group_by(user_id) %>%
mutate(trip_id = cumsum(flag_new_trip)+1)
Upvotes: 0
Reputation: 101129
Try cumsum
like below
df %>%
group_by(User_id) %>%
mutate(trip_id = cumsum(trip>155)+1) %>%
ungroup()
Upvotes: 2