Reputation: 239
I have some data that I am trying to filter in dplyr, but I can't seem to get the code right to get it done. Here are the two data sets:
df1 <- data.frame(Chromosome = c("chr1", "chr1", "chr2", "chr3", "chr4"),
Position = c(5 ,12, 20, 25, 50), stringsAsFactors = FALSE)
> df1
Chromosome Position
1 chr1 5
2 chr1 12
3 chr2 20
4 chr3 25
5 chr4 50
df2 <- data.frame(Chromosome = c("chr1", "chr3"), From = c(1, 20),
To = c(10, 80),stringsAsFactors = FALSE)
> df2
Chromosome From To
1 chr1 1 10
2 chr3 20 80
What I would like to do is select those rows from the first table where the chromosome numbers are identical between the tables and the position is contained between the "From" and the "To" in the second table. So the output here would be:
Chromosome Position
1 chr1 5
2 chr3 25
Any suggestions on how to write this in R? In particular I'd love to use dplyr functions, but not required.
Upvotes: 3
Views: 1991
Reputation: 8343
If you are open to data.table
one method would be
library(data.table)
setDT(df1, key="Chromosome")
setDT(df2, key="Chromosome")
df1[ df2, { idx = i.From <= Position & Position <= i.To
.(Position = Position[idx])
},
by=.EACHI
]
# Chromosome Position
#1: chr1 5
#2: chr3 25
Inspired by this answer to a previous question of mine
Upvotes: 0
Reputation: 23054
library(dplyr)
left_join(df1, df2) %>%
filter(Position > From, Position < To ) %>%
select(-From, -To)
Chromosome Position
1 chr1 5
2 chr3 25
This assumes no duplicated values of Chromosome
in your data.frame with From and To.
Upvotes: 3