Brigitta Rebane
Brigitta Rebane

Reputation: 3

Fill values in column until certain value in another by group in R

I have a table where is user id and the duration of the trip in hours.

data.frame(
  check.names = FALSE,
      user_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
   `trip.(h)` = c(0, 0.02, 170, 0.07, 12, 3, 90, 550, 1)
)
user_id trip (h)
1 0
1 0.02
1 170
1 0.07
1 12
1 3
1 90
1 550
1 1
... ...

Now I would like to add a column with a trip id for each person. Trip id should change for each person (unique user_id ) if any value in column trip is longer than 155 hours. I would do it with a loop, but the dataframe contains 12 columns and 5.7m rows so it would be highly ineffective. Is it possible to do it columnwise with dplyr? Since I have multiple user_id, I would firstly group by user id and then maybe do it with mutate()? Maybe with fill()?

Also thought of counting >155h trips for each person and then assigning 1:n() elements to each trip, but it also seems a little far fetched

This is the desired output:

data.frame(
  check.names = FALSE,
      User.id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
   `trip.(h)` = c(0, 0.02, 170, 0.07, 12, 3, 90, 550, 1),
      trip_id = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L)
)
User id trip (h) trip_id
1 0 1
1 0.02 1
1 170 2
1 0.07 2
1 12 2
1 3 2
1 90 2
1 550 3
1 1 3
... ... ...

Upvotes: 0

Views: 1106

Answers (2)

Vinson Ciawandy
Vinson Ciawandy

Reputation: 1156

trip (h) will be read as trip..h. in R

your_data %>% 
  mutate(flag_new_trip = as.integer(trip..h. > 155)) %>% 
  group_by(user_id) %>% 
  mutate(trip_id = cumsum(flag_new_trip)+1)

Upvotes: 0

ThomasIsCoding
ThomasIsCoding

Reputation: 101129

Try cumsum like below

df  %>% 
  group_by(User_id)  %>% 
  mutate(trip_id = cumsum(trip>155)+1) %>% 
  ungroup()

Upvotes: 2

Related Questions