Steven
Steven

Reputation: 239

selecting records within a range in dplyr

I have some data that I am trying to filter in dplyr, but I can't seem to get the code right to get it done. Here are the two data sets:

df1 <- data.frame(Chromosome = c("chr1", "chr1", "chr2", "chr3", "chr4"),
                  Position = c(5 ,12, 20, 25, 50), stringsAsFactors = FALSE)

> df1
  Chromosome Position
1       chr1        5
2       chr1       12
3       chr2       20
4       chr3       25
5       chr4       50

df2 <- data.frame(Chromosome = c("chr1", "chr3"), From = c(1, 20),
                  To = c(10, 80),stringsAsFactors = FALSE)

> df2
  Chromosome From To
1       chr1    1 10
2       chr3   20 80

What I would like to do is select those rows from the first table where the chromosome numbers are identical between the tables and the position is contained between the "From" and the "To" in the second table. So the output here would be:

  Chromosome Position
1       chr1        5
2       chr3       25

Any suggestions on how to write this in R? In particular I'd love to use dplyr functions, but not required.

Upvotes: 3

Views: 1991

Answers (2)

tospig
tospig

Reputation: 8343

If you are open to data.table one method would be

library(data.table)
setDT(df1, key="Chromosome")
setDT(df2, key="Chromosome")

df1[ df2,  { idx = i.From <= Position & Position <= i.To
              .(Position = Position[idx])
          },
     by=.EACHI
     ]

#   Chromosome Position
#1:       chr1        5
#2:       chr3       25

Inspired by this answer to a previous question of mine

Upvotes: 0

Sam Firke
Sam Firke

Reputation: 23054

library(dplyr)
left_join(df1, df2) %>%
  filter(Position > From, Position < To ) %>%
  select(-From, -To)

  Chromosome Position
1       chr1        5
2       chr3       25

This assumes no duplicated values of Chromosome in your data.frame with From and To.

Upvotes: 3

Related Questions