Reputation: 133

Use range in column values to create new rows in R

I have a data set that looks like this

id<-c(333,333,333,342,342,342,342)
status<-c('relationship','relationship','married','single','single','married')
start<-c(17,19,25,22,36,40)
end<-c(18,23,29,29,39,44)

dat<-data.frame(id,status,start,end)

   id       status start end
1 333 relationship    17  18
2 333 relationship    19  23
3 333      married    25  29
4 342       single    32  35
5 342       single    36  39
6 342       single    42  44
7 342      married    45  50

My goal is to make a row for each year/age of the person under observation. For example, person 333 is observed from age 17 to 29 (from 17 to 18, this person was in a relationship, 19 to 23 in a relationship), with some gap years (which should be coded unknown). For those whose next episode start year is missing, the end status can be seen as that age's status. But if there is an overlap then the start age status prevails (e.g. if someone is in a relationship 17 to 19, then married 19 to 22, then age 19 should take married as its status). So the end result should look like this:

   id start       status
1  333    17 relationship
2  333    18 relationship
3  333    19 relationship
4  333    20 relationship
5  333    21 relationship
6  333    22 relationship
7  333    23 relationship
8  333    24      unknown
9  333    25      married
10 333    26      married
11 333    27      married
12 333    28      married
13 333    29      married
14 342    22       single
15 342    23       single
16 342    24       single
17 342    25       single
18 342    26       single
19 342    27       single
20 342    28       single
21 342    29       single
22 342    30      unknown
23 342    31      unknown
24 342    32      unknown
25 342    33      unknown
26 342    34      unknown
27 342    35      unknown
28 342    36       single
29 342    37       single
30 342    38       single
31 342    39       single
32 342    40      married
33 342    41      married
34 342    42      married
35 342    43      married
36 342    44      married

Is there a way of doing this without writing a for loop?

Upvotes: 0

Answers (3)

akrun

Reputation: 887251

Using base R, get the sequence : between the corresponding elements of 'start', 'end using Map in a list, then replicate the rows of the original data based on the lengths of the list element and transform by adding a new column by unlisting the list output

lst1 <- Map(`:`, dat$start, dat$end)
transform(dat[rep(seq_len(nrow(dat)), lengths(lst1)), c('id', 'status')], 
   start = unlist(lst1))

Upvotes: 0

ThomasIsCoding

Reputation: 101773

A data.table option

> setDT(dat)[, .(age = seq(start, end)), .(I = seq(nrow(dat)), id, status)]
    I  id       status age
 1: 1 333 relationship  17
 2: 1 333 relationship  18
 3: 2 333 relationship  19
 4: 2 333 relationship  20
 5: 2 333 relationship  21
 6: 2 333 relationship  22
 7: 2 333 relationship  23
 8: 3 333      married  25
 9: 3 333      married  26
10: 3 333      married  27
11: 3 333      married  28
12: 3 333      married  29
13: 4 342       single  32
14: 4 342       single  33
15: 4 342       single  34
16: 4 342       single  35
17: 5 342       single  36
18: 5 342       single  37
19: 5 342       single  38
20: 5 342       single  39
21: 6 342       single  42
22: 6 342       single  43
23: 6 342       single  44
24: 7 342      married  45
25: 7 342      married  46
26: 7 342      married  47
27: 7 342      married  48
28: 7 342      married  49
29: 7 342      married  50
    I  id       status age

Upvotes: 0

Gregor Thomas

Reputation: 145835

library(purrr)
library(dplyr)
library(tidyr)
dat %>%
  mutate(age = map2(start, end, seq)) %>%
  unnest(age) %>%
  ## deal with overlaps by keeping the max start age:
  group_by(id, age) %>%
  slice_max(start) %>%
  ungroup
# # A tibble: 29 x 5
#       id status       start   end   age
#    <dbl> <chr>        <dbl> <dbl> <int>
#  1   333 relationship    17    18    17
#  2   333 relationship    17    18    18
#  3   333 relationship    19    23    19
#  4   333 relationship    19    23    20
#  5   333 relationship    19    23    21
#  6   333 relationship    19    23    22
#  7   333 relationship    19    23    23
#  8   333 married         25    29    25
#  9   333 married         25    29    26
# 10   333 married         25    29    27
# # ... with 19 more rows

This doesn't fill in any gaps, but you could use a similar idea with tidyr::complete for that.

Upvotes: 5

Use range in column values to create new rows in R

Answers (3)

Related Questions