Fabian
Fabian

Reputation: 105

R - Split numeric vector into intervals

I have a question regarding the "splitting" of a vector, although different approaches might be feasible. I have a data.frame(df) which looks like this (simplified version):

   case time
1   1   5
2   2   3
3   3   4

The "time" variable counts units of time (days, weeks etc) until an event occurs. I would like to expand the data set by increasing the number of rows and "split" the "time" into intervals of length 1, beginning at 2. The result might then look something like this:

    case    time    begin   end
1   1       5       2       3
2   1       5       3       4
3   1       5       4       5
4   2       3       2       3
5   3       4       2       3
6   3       4       3       4

Obviously, my data set is a bit larger than this example. What would be a feasible method to achieve this result?

I had one idea of beginning with

df.exp <- df[rep(row.names(df), df$time - 2), 1:2]

in order to expand the number of rows per case, according to the number of time intervals. Based on this, a "begin" and "end" column might be added in the fashion of:

df.exp$begin <- 2:(df.exp$time-1)

However, I'm not successful at creating the respective columns, because this command only uses the first row to calculate (df.exp$time-1) and doesn't automatically distinguish by "case".

Any ideas would be very much appreciated!

Upvotes: 5

Views: 1719

Answers (2)

akrun
akrun

Reputation: 887118

You can try

df2 <- df1[rep(1:nrow(df1), df1$time-2),]
row.names(df2) <- NULL
m1 <- do.call(rbind,
          Map(function(x,y) {
                  v1 <- seq(x,y)
                  cbind(v1[-length(v1)],v1[-1L])},
                  2, df1$time))
df2[c('begin', 'end')] <- m1
df2
#  case time begin end
#1    1    5     2   3
#2    1    5     3   4
#3    1    5     4   5
#4    2    3     2   3
#5    3    4     2   3
#6    3    4     3   4

Or an option with data.table

library(data.table)
setDT(df1)[,{tmp <- seq(2, time)
               list(time= time,
                    begin= tmp[-length(tmp)],
                    end=tmp[-1])} , by = case]
#   case time begin end
#1:    1    5     2   3
#2:    1    5     3   4
#3:    1    5     4   5
#4:    2    3     2   3
#5:    3    4     2   3
#6:    3    4     3   4

Upvotes: 7

talat
talat

Reputation: 70266

library(data.table)
DT <- as.data.table(df)
DT[, rep(time, time-2), case][, begin := 2:(.N+1), case][, end := begin +1][]
#   case V1 begin end
#1:    1  5     2   3
#2:    1  5     3   4
#3:    1  5     4   5
#4:    2  3     2   3
#5:    3  4     2   3
#6:    3  4     3   4

Upvotes: 6

Related Questions