Reshaping from wide to long and vice versa (multistate/survival analysis dataset)

Question

I am trying to reshape the following dataset with reshape(), without much results.

The starting dataset is in "wide" form, with each id described through one row. The dataset is intended to be adopted for carry out Multistate analyses (a generalization of Survival Analysis).

Each person is recorded for a given overall time span. During this period the subject can experience a number of transitions among states (for simplicity let us fix to two the maximum number of distinct states that can be visited). The first visited state is s1 = 1, 2, 3, 4. The person stays within the state for dur1 time periods, and the same applies for the second visited state s2:

   id    cohort    s1     dur1     s2     dur2     
     1      1        3      4       2      5       
     2      0        1      4       4      3

The dataset in long format which I woud like to obtain is:

id    cohort    s    
1       1       3
1       1       3
1       1       3
1       1       3
1       1       2
1       1       2
1       1       2
1       1       2
1       1       2
2       0       1
2       0       1
2       0       1
2       0       1
2       0       4
2       0       4
2       0       4

In practice, each id has dur1 + dur2 rows, and s1 and s2 are melted in a single variable s.

How would you do this transformation? Also, how would you cmoe back to the original dataset "wide" form?

Many thanks!

dat <- cbind(id=c(1,2), cohort=c(1, 0), s1=c(3, 1), dur1=c(4, 4), s2=c(2, 4), dur2=c(5, 3))

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

You can use reshape() for the first step, but then you need to do some more work. Also, reshape() needs a data.frame() as its input, but your sample data is a matrix.

Here's how to proceed:

reshape() your data from wide to long:

dat2 <- reshape(data.frame(dat), direction = "long", 
                idvar = c("id", "cohort"),
                varying = 3:ncol(dat), sep = "")
dat2
#       id cohort time s dur
# 1.1.1  1      1    1 3   4
# 2.0.1  2      0    1 1   4
# 1.1.2  1      1    2 2   5
# 2.0.2  2      0    2 4   3

"Expand" the resulting data.frame using rep()

dat3 <- dat2[rep(seq_len(nrow(dat2)), dat2$dur), c("id", "cohort", "s")]
dat3[order(dat3$id), ]
#         id cohort s
# 1.1.1    1      1 3
# 1.1.1.1  1      1 3
# 1.1.1.2  1      1 3
# 1.1.1.3  1      1 3
# 1.1.2    1      1 2
# 1.1.2.1  1      1 2
# 1.1.2.2  1      1 2
# 1.1.2.3  1      1 2
# 1.1.2.4  1      1 2
# 2.0.1    2      0 1
# 2.0.1.1  2      0 1
# 2.0.1.2  2      0 1
# 2.0.1.3  2      0 1
# 2.0.2    2      0 4
# 2.0.2.1  2      0 4
# 2.0.2.2  2      0 4

You can get rid of the funky row names too by using rownames(dat3) <- NULL.

Update: Retaining the ability to revert to the original form

In the example above, since we dropped the "time" and "dur" variables, it isn't possible to directly revert to the original dataset. If you feel this is something you would need to do, I suggest keeping those columns in and creating another data.frame with the subset of the columns that you need if required.

Here's how:

Use aggregate() to get back to "dat2":

aggregate(cbind(s, dur) ~ ., dat3, unique)
#   id cohort time s dur
# 1  2      0    1 1   4
# 2  1      1    1 3   4
# 3  2      0    2 4   3
# 4  1      1    2 2   5

Wrap reshape() around that to get back to "dat1". Here, in one step:

reshape(aggregate(cbind(s, dur) ~ ., dat3, unique), 
        direction = "wide", idvar = c("id", "cohort"))
#   id cohort s.1 dur.1 s.2 dur.2
# 1  2      0   1     4   4     3
# 2  1      1   3     4   2     5

Reshaping from wide to long and vice versa (multistate/survival analysis dataset)

Answers (2)

Update: Retaining the ability to revert to the original form

Related Questions