Christian
Christian

Reputation: 243

R find intervals in data.table

i want to add a new column with intervals or breakpoints by group. As an an example:

This is my data.table:

x <- data.table(a = c(1:8,1:8), b = c(rep("A",8),rep("B",8)))

I have already the breakpoint or rowindices:

pos <- data.table(b =  c("A","A","B","B"), bp = c(3,5,2,4))

Here i can find the interval for group "A" with:

findInterval(1:nrow(x[b=="A"]), pos[b=="A"]$bp)

How can i do this for each group. In this case "A" and "B"?

Upvotes: 2

Views: 420

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388817

We can nest the pos data into list by b and join with x and use findInterval to get corresponding groups.

library(dplyr)

pos %>% 
   tidyr::nest(data = bp) %>%
   right_join(x, by = 'b') %>%
   group_by(b) %>%
   mutate(interval = findInterval(a, data[[1]][[1]])) %>%
   select(-data)

#    b        a interval
#   <chr> <int>    <int>
# 1 A         1        0
# 2 A         2        0
# 3 A         3        1
# 4 A         4        1
# 5 A         5        2
# 6 A         6        2
# 7 A         7        2
# 8 A         8        2
# 9 B         1        0
#10 B         2        1
#11 B         3        1
#12 B         4        2
#13 B         5        2
#14 B         6        2
#15 B         7        2
#16 B         8        2

Upvotes: 0

chinsoon12
chinsoon12

Reputation: 25225

Another option using rolling join in data.table:

pos[, ri := rowid(b)]
x[, intvl := fcoalesce(pos[x, on=.(b, bp=a), roll=Inf, ri], 0L)]

output:

    a b intvl
 1: 1 A     0
 2: 2 A     0
 3: 3 A     1
 4: 4 A     1
 5: 5 A     2
 6: 6 A     2
 7: 7 A     2
 8: 8 A     2
 9: 1 B     0
10: 2 B     1
11: 3 B     1
12: 4 B     2
13: 5 B     2
14: 6 B     2
15: 7 B     2
16: 8 B     2

Upvotes: 0

akrun
akrun

Reputation: 886948

An option is to split the datasets by 'b' column, use Map to loop over the corresponding lists, and apply findInterval

Map(function(u, v) findInterval(seq_len(nrow(u)), v$bp), 
      split(x, x$b), split(pos, pos$b))
#$A
#[1] 0 0 1 1 2 2 2 2

#$B
#[1] 0 1 1 2 2 2 2 2

or another option is to group by 'b' from 'x', then use findInterval by subsetting the 'bp' from 'pos' by filtering with a logical condition created based on .BY

x[, findInterval(seq_len(.N), pos$bp[pos$b==.BY]), b]
#    b V1
# 1: A  0
# 2: A  0
# 3: A  1
# 4: A  1
# 5: A  2
# 6: A  2
# 7: A  2
# 8: A  2
# 9: B  0
#10: B  1
#11: B  1
#12: B  2
#13: B  2
#14: B  2
#15: B  2
#16: B  2

Upvotes: 3

Related Questions