Reputation: 133
I have a data set that looks like this
id<-c(333,333,333,342,342,342,342)
status<-c('relationship','relationship','married','single','single','married')
start<-c(17,19,25,22,36,40)
end<-c(18,23,29,29,39,44)
dat<-data.frame(id,status,start,end)
id status start end
1 333 relationship 17 18
2 333 relationship 19 23
3 333 married 25 29
4 342 single 32 35
5 342 single 36 39
6 342 single 42 44
7 342 married 45 50
My goal is to make a row for each year/age of the person under observation. For example, person 333 is observed from age 17 to 29 (from 17 to 18, this person was in a relationship, 19 to 23 in a relationship), with some gap years (which should be coded unknown). For those whose next episode start year is missing, the end status can be seen as that age's status. But if there is an overlap then the start age status prevails (e.g. if someone is in a relationship 17 to 19, then married 19 to 22, then age 19 should take married as its status). So the end result should look like this:
id start status
1 333 17 relationship
2 333 18 relationship
3 333 19 relationship
4 333 20 relationship
5 333 21 relationship
6 333 22 relationship
7 333 23 relationship
8 333 24 unknown
9 333 25 married
10 333 26 married
11 333 27 married
12 333 28 married
13 333 29 married
14 342 22 single
15 342 23 single
16 342 24 single
17 342 25 single
18 342 26 single
19 342 27 single
20 342 28 single
21 342 29 single
22 342 30 unknown
23 342 31 unknown
24 342 32 unknown
25 342 33 unknown
26 342 34 unknown
27 342 35 unknown
28 342 36 single
29 342 37 single
30 342 38 single
31 342 39 single
32 342 40 married
33 342 41 married
34 342 42 married
35 342 43 married
36 342 44 married
Is there a way of doing this without writing a for loop?
Upvotes: 0
Views: 823
Reputation: 887251
Using base R
, get the sequence :
between the corresponding elements of 'start', 'end using Map
in a list
, then rep
licate the rows of the original data based on the lengths
of the list
element and transform
by adding a new column by unlist
ing the list
output
lst1 <- Map(`:`, dat$start, dat$end)
transform(dat[rep(seq_len(nrow(dat)), lengths(lst1)), c('id', 'status')],
start = unlist(lst1))
Upvotes: 0
Reputation: 101773
A data.table
option
> setDT(dat)[, .(age = seq(start, end)), .(I = seq(nrow(dat)), id, status)]
I id status age
1: 1 333 relationship 17
2: 1 333 relationship 18
3: 2 333 relationship 19
4: 2 333 relationship 20
5: 2 333 relationship 21
6: 2 333 relationship 22
7: 2 333 relationship 23
8: 3 333 married 25
9: 3 333 married 26
10: 3 333 married 27
11: 3 333 married 28
12: 3 333 married 29
13: 4 342 single 32
14: 4 342 single 33
15: 4 342 single 34
16: 4 342 single 35
17: 5 342 single 36
18: 5 342 single 37
19: 5 342 single 38
20: 5 342 single 39
21: 6 342 single 42
22: 6 342 single 43
23: 6 342 single 44
24: 7 342 married 45
25: 7 342 married 46
26: 7 342 married 47
27: 7 342 married 48
28: 7 342 married 49
29: 7 342 married 50
I id status age
Upvotes: 0
Reputation: 145835
library(purrr)
library(dplyr)
library(tidyr)
dat %>%
mutate(age = map2(start, end, seq)) %>%
unnest(age) %>%
## deal with overlaps by keeping the max start age:
group_by(id, age) %>%
slice_max(start) %>%
ungroup
# # A tibble: 29 x 5
# id status start end age
# <dbl> <chr> <dbl> <dbl> <int>
# 1 333 relationship 17 18 17
# 2 333 relationship 17 18 18
# 3 333 relationship 19 23 19
# 4 333 relationship 19 23 20
# 5 333 relationship 19 23 21
# 6 333 relationship 19 23 22
# 7 333 relationship 19 23 23
# 8 333 married 25 29 25
# 9 333 married 25 29 26
# 10 333 married 25 29 27
# # ... with 19 more rows
This doesn't fill in any gaps, but you could use a similar idea with tidyr::complete
for that.
Upvotes: 5