Al Mac
Al Mac

Reputation: 93

Extract grouped values from a dataframe based on a range of grouped values from another dataframe using tidyverse

I am trying to extract the grouped index values from a dataframe (df1) that represent a range of grouped times (start - end) and that encompass the grouped times given in another dataframe (df2). My required output is df3.

df1<-data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.frame(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))
df3<-data.frame(time=c(11,17,24,5,5,22),index=c(2,7,8,9,1,12))

A previous related question I posted was answered with a neat pipe solution for ungrouped data:

    library(tidyverse)
df1 %>% 
    select(from = start, to = end) %>% 
    pmap(seq) %>% 
    do.call(cbind, .) %>% 
    list(.) %>%
    mutate(df2, new = ., 
                ind = map2(time, new, ~ which(.x == .y, arr.ind = TRUE)[,2])) %>%
    select(-new)

Can this be modified to group by the 'group' column in both df1 and df2 to give the output df3?

Upvotes: 3

Views: 109

Answers (2)

akrun
akrun

Reputation: 887501

With group_by, we can nest and then do a join

library(tidyverse)
df1 %>% 
  group_by(group) %>%
  nest(-group)  %>%
  mutate(new = map(data, ~.x %>% 
  select(from = start, to = end) %>%
  pmap(seq) %>% 
  do.call(cbind, .) %>% 
  list(.))) %>%
  right_join(df2) %>%
  mutate(ind = map2_int(time, new, ~ which(.x == .y[[1]], arr.ind = TRUE)[,2]),
          ind = map2_dbl(ind, data, ~ .y$index[.x])) %>%
  select(time, ind)
# A tibble: 6 x 2
#   time   ind
#  <dbl> <dbl>
#1 11.0   2.00
#2 17.0   7.00
#3 24.0   8.00
#4  5.00  9.00
#5  5.00  1.00
#6 22.0  12.0 

Upvotes: 2

Orhan Yazar
Orhan Yazar

Reputation: 909

Here is something nice with data.table,

df1<-data.table(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),index=c(1,2,3,4,5,6,7,8,9,10,11,12),start=c(5,10,15,20,5,10,15,20,5,10,15,20),end=c(10,15,20,25,10,15,20,25,10,15,20,25))
df2<-data.table(group = c("A","B","B","C","A","C"),time=c(11,17,24,5,5,22))


df1[df2,on=.(group,start<=time,end>=time)][,c("start","index")]


   start index
1:    11     2
2:    17     7
3:    24     8
4:     5     9
5:     5     1
6:    22    12

you can then rename the start column to time and you got your answer i think.

Upvotes: 1

Related Questions