Reputation: 22623
I have two data frames:
df
where for single x
there are multiple values y
matchDf
which defines constraints I would like to use to filter df
I would like to filter only rows where y
is lower or equal to max_y
given that x
falls in interval <x_from, x_to)
Is there any way to do this in R (without SQL I used below)?
df <- read.table(header = TRUE, text = '
x y
100 0.1
100 0.2
100 0.3
250 0.2
250 0.3
250 0.4
375 0.2
375 0.25
375 0.35
420 0.15
420 0.16
420 0.17
500 0.23
500 0.55')
matchDf <- read.table(header = TRUE, text = '
x_from x_to max_y
0 300 .2
300 500 .3
500 99999 .5
')
library(sqldf)
sqldf('select a.*
from
df a
join matchDf b on (a.x >= b.x_from
and a.x < b.x_to
and a.y <= b.max_y)'
)
Upvotes: 4
Views: 168
Reputation: 24074
You can do:
df[mapply(function(x, y) {
y <= matchDf$max_y[x >=matchDf$x_from & x < matchDf$x_to]
}, x=df$x, y=df$y), ]
# x y
# 1 100 0.10
# 2 100 0.20
# 4 250 0.20
# 7 375 0.20
# 8 375 0.25
# 10 420 0.15
# 11 420 0.16
# 12 420 0.17
# 13 500 0.23
The function mapply
permits, for each couple (x,y)
, to know if y
is lower or equal to the appropriate max_y
value and is applied to each "couple" of df
and return TRUE
or FALSE
, then df
is subsetted according to the mapply
result.
Upvotes: 3
Reputation: 44527
Try this:
df[df$y <= matchDf$max_y[cut(df$x, c(0,matchDf$x_to))],]
x y
1 100 0.10
2 100 0.20
4 250 0.20
7 375 0.20
8 375 0.25
10 420 0.15
11 420 0.16
12 420 0.17
13 500 0.23
What's happening here is that cut
is giving you which row in matchDf
each observation in df
should be in. Then, you just use that as a positional row extraction vector, and state the conditional relationship for y
using <=
.
To see how cut
works, just pull it out of the expression:
> cut(df$x, c(0,matchDf$x_to))
[1] (0,300] (0,300] (0,300] (0,300] (0,300] (0,300] (300,500] (300,500] (300,500] (300,500] (300,500] (300,500] (300,500] (300,500]
Levels: (0,300] (300,500] (500,1e+05]
The level labels are irrelevant, because [
uses the underlying integer values to extract.
Upvotes: 4