Conditional manipulation and extension of rows in data.table also considering previous extensions without for-loop

Question

Suppose I have two data.tables:

A <- data.table(
  idx = c(1,2,3),
  leftbound = c(1,134,1546),
  rightbound = c(65, 180, 1670),
  infA = c("infA1", "infA2", "infA3")
)

A
   idx leftbound rightbound  infA
1:   1         1         65 infA1
2:   2       134        180 infA2
3:   3      1546       1670 infA3




B <- data.table(
  breakpoint = c(150, 165, 1555),
  infB = c("infB1", "infB2", "infB3")
)

B

   breakpoint  infB
1:        150 infB1
2:        165 infB2
3:       1555 infB3

In data.table A each row corresponds to a range from a left to a right boundary. It has an index (idx) column, a right and a left boundary column (leftbound and rightbound) and an additional variable (infA). Data.table B includes points which should be inserted as breaking points into the boundaries in the first table. So e.g. range in row 2 from 134 to 180 should be split at 150 and 165. Hence this range should be split in three ranges: 134 - 150, 150 - 165 and 165 to 180. For each of this three ranges there should be a new row substituting the old "unsplit" one.

Hence the Output should look like:

Output
   peak.grp   lb   ub  infA  infB
1:        1    1   65 infA1 infB1
2:        2  134  150 infA2 infB2
3:        2  150  165 infA2 infB2
4:        2  165  180 infA2 infB2
5:        3 1546 1555 infA3 infB3
6:        3 1555 1670 infA3 infB3

Is there some way to achive this without a for-loop?

Frank · Accepted Answer

Same as @Alexis but vectorized instead of lapply over breakpoints:

res <- B[A, on=.(breakpoint >= leftbound, breakpoint <= rightbound), {
  v = c(i.leftbound, head(x.breakpoint, .N), i.rightbound)
  n = c(i.infA, head(x.infB, .N), i.infA)
  .(idx = idx, lb = head(v, -1), rb = tail(v, -1), ln = head(n, -1), rn = tail(n, -1))
}, by=.EACHI][, (1:2) := NULL][]

   idx   lb   rb    ln    rn
1:   1    1   65 infA1 infA1
2:   2  134  150 infA2 infB1
3:   2  150  165 infB1 infB2
4:   2  165  180 infB2 infA2
5:   3 1546 1555 infA3 infB3
6:   3 1555 1670 infB3 infA3

I'm using the head(var, .N) in case the variable is populated with NA because no match is found (but we'll still have .N == 0, so head(var, .N) will have zero length). I think if (.N) var would also work, and maybe be more readable.

Related: https://github.com/Rdatatable/data.table/issues/3452

Conditional manipulation and extension of rows in data.table also considering previous extensions without for-loop

Answers (2)

Related Questions