wylie
wylie

Reputation: 193

R using groupby to add rows to dataframe

New R user here. I have a dataframe that looks like this:

│   │ uID │ day│ status │
├───┼─────┼────┼────────┤
│   │ A   │ 5  │ 1      │
│   │ B   │ 4  │ 1      │
│   │ C   │ 9  │ 1      │

I want to add new rows to this table so that each uID has a row for all days from 1 to [[day]]. Additionally, for all days leading up to [[day]], each uID's status will be 0.

For example:

│   │ uID │ day │ status │
├───┼─────┼─────┼────────┤
│   │ A   │ 1   │ 0      │
│   │ A   │ 2   │ 0      │
│   │ A   │ 3   │ 0      │
│   │ A   │ 4   │ 0      │
│   │ A   │ 5   │ 1      │
│   │ B   │ 1   │ 0      │
│   │ B   │ 2   │ 0      │
│   │ B   │ 3   │ 0      │
│   │ B   │ 4   │ 1      │

There's definitely an ugly way to do this with some for-loops, but I was wondering if there's a more graceful way to do it, with something like a groupby?

Thanks!

Upvotes: 2

Views: 63

Answers (1)

akrun
akrun

Reputation: 887168

We can use group_by on 'uID' and summarise to create the columns 'status' and 'day. With dplyr version >= 1.0, summarise doesn't have the constraint to return only a single row per group

library(dplyr)
df1 %>%
   group_by(uID) %>%
   summarise(status = rep(c(0, 1), c(day-1, 1)), 
           day = seq(day), .groups = 'drop') %>%
   select(names(df1))

-output

# A tibble: 18 x 3
#   uID     day status
#   <chr> <int>  <dbl>
# 1 A         1      0
# 2 A         2      0
# 3 A         3      0
# 4 A         4      0
# 5 A         5      1
# 6 B         1      0
# 7 B         2      0
# 8 B         3      0
# 9 B         4      1
#10 C         1      0
#11 C         2      0
#12 C         3      0
#13 C         4      0
#14 C         5      0
#15 C         6      0
#16 C         7      0
#17 C         8      0
#18 C         9      1

Or another option is

library(tidyr)
library(purrr)
df1 %>%
    mutate(day = map(day, seq)) %>%
    unnest(c(day)) %>%
    group_by(uID) %>%
    mutate(status = +(row_number() == n()))

data

df1 <- structure(list(uID = c("A", "B", "C"), day = c(5, 4, 9), status = c(1, 
1, 1)), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 2

Related Questions