Reputation: 193
New R user here. I have a dataframe that looks like this:
│ │ uID │ day│ status │
├───┼─────┼────┼────────┤
│ │ A │ 5 │ 1 │
│ │ B │ 4 │ 1 │
│ │ C │ 9 │ 1 │
I want to add new rows to this table so that each uID has a row for all days from 1 to [[day]]. Additionally, for all days leading up to [[day]], each uID's status will be 0.
For example:
│ │ uID │ day │ status │
├───┼─────┼─────┼────────┤
│ │ A │ 1 │ 0 │
│ │ A │ 2 │ 0 │
│ │ A │ 3 │ 0 │
│ │ A │ 4 │ 0 │
│ │ A │ 5 │ 1 │
│ │ B │ 1 │ 0 │
│ │ B │ 2 │ 0 │
│ │ B │ 3 │ 0 │
│ │ B │ 4 │ 1 │
There's definitely an ugly way to do this with some for-loops, but I was wondering if there's a more graceful way to do it, with something like a groupby?
Thanks!
Upvotes: 2
Views: 63
Reputation: 887168
We can use group_by
on 'uID' and summarise
to create the columns 'status' and 'day. With dplyr
version >= 1.0
, summarise
doesn't have the constraint to return only a single row per group
library(dplyr)
df1 %>%
group_by(uID) %>%
summarise(status = rep(c(0, 1), c(day-1, 1)),
day = seq(day), .groups = 'drop') %>%
select(names(df1))
-output
# A tibble: 18 x 3
# uID day status
# <chr> <int> <dbl>
# 1 A 1 0
# 2 A 2 0
# 3 A 3 0
# 4 A 4 0
# 5 A 5 1
# 6 B 1 0
# 7 B 2 0
# 8 B 3 0
# 9 B 4 1
#10 C 1 0
#11 C 2 0
#12 C 3 0
#13 C 4 0
#14 C 5 0
#15 C 6 0
#16 C 7 0
#17 C 8 0
#18 C 9 1
Or another option is
library(tidyr)
library(purrr)
df1 %>%
mutate(day = map(day, seq)) %>%
unnest(c(day)) %>%
group_by(uID) %>%
mutate(status = +(row_number() == n()))
df1 <- structure(list(uID = c("A", "B", "C"), day = c(5, 4, 9), status = c(1,
1, 1)), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 2