Reputation: 105
I have a question regarding the "splitting" of a vector, although different approaches might be feasible. I have a data.frame(df) which looks like this (simplified version):
case time
1 1 5
2 2 3
3 3 4
The "time" variable counts units of time (days, weeks etc) until an event occurs. I would like to expand the data set by increasing the number of rows and "split" the "time" into intervals of length 1, beginning at 2. The result might then look something like this:
case time begin end
1 1 5 2 3
2 1 5 3 4
3 1 5 4 5
4 2 3 2 3
5 3 4 2 3
6 3 4 3 4
Obviously, my data set is a bit larger than this example. What would be a feasible method to achieve this result?
I had one idea of beginning with
df.exp <- df[rep(row.names(df), df$time - 2), 1:2]
in order to expand the number of rows per case, according to the number of time intervals. Based on this, a "begin" and "end" column might be added in the fashion of:
df.exp$begin <- 2:(df.exp$time-1)
However, I'm not successful at creating the respective columns, because this command only uses the first row to calculate (df.exp$time-1) and doesn't automatically distinguish by "case".
Any ideas would be very much appreciated!
Upvotes: 5
Views: 1719
Reputation: 887118
You can try
df2 <- df1[rep(1:nrow(df1), df1$time-2),]
row.names(df2) <- NULL
m1 <- do.call(rbind,
Map(function(x,y) {
v1 <- seq(x,y)
cbind(v1[-length(v1)],v1[-1L])},
2, df1$time))
df2[c('begin', 'end')] <- m1
df2
# case time begin end
#1 1 5 2 3
#2 1 5 3 4
#3 1 5 4 5
#4 2 3 2 3
#5 3 4 2 3
#6 3 4 3 4
Or an option with data.table
library(data.table)
setDT(df1)[,{tmp <- seq(2, time)
list(time= time,
begin= tmp[-length(tmp)],
end=tmp[-1])} , by = case]
# case time begin end
#1: 1 5 2 3
#2: 1 5 3 4
#3: 1 5 4 5
#4: 2 3 2 3
#5: 3 4 2 3
#6: 3 4 3 4
Upvotes: 7
Reputation: 70266
library(data.table)
DT <- as.data.table(df)
DT[, rep(time, time-2), case][, begin := 2:(.N+1), case][, end := begin +1][]
# case V1 begin end
#1: 1 5 2 3
#2: 1 5 3 4
#3: 1 5 4 5
#4: 2 3 2 3
#5: 3 4 2 3
#6: 3 4 3 4
Upvotes: 6