Federico
Federico

Reputation: 3920

How does dplyr filter works in R?

I want to filter only the rows that are less than 10 units away form the point (1,1). My dataframe has two columns, x and y.

This is what I have tried:

filter(df, dist( rbind(c(1,2), c(x,y)) ) < 10 )

But, this is not working. It always returns a 0 row result, although I know that it should return a couple of rows. How can I debug this? I would like to print every value passed to x and y in every iteration.

Per request, this is the output of dput(head(df)):

structure(list(x = c(1, 2, 3, 4, 5), y = c(1, 1, 1, 1, 1)), .Names = c("x", 
"y"), row.names = c(NA, 5L), class = "data.frame")

Upvotes: 0

Views: 753

Answers (1)

r2evans
r2evans

Reputation: 160447

I would use your data but it is not affected by the filter. So I will create something random:

library(dplyr)
set.seed(42)
df <- data_frame(x = sample(20, size = 20, replace = TRUE),
                 y = sample(20, size = 20, replace = TRUE))
head(df)
# Source: local data frame [6 x 2]
#       x     y
#   <int> <int>
# 1    19    19
# 2    19     3
# 3     6    20
# 4    17    19
# 5    13     2
# 6    11    11

The problem is that dplyr::filter requires a vector of logical. If you manually check the return of dist(...), it is returning an "n-by-n" array. It is not clear how exactly filter should presume to use that.

If your data really is just one point (c(1, 2)), then you need to manually calculate the distance between the known point and the variables of the data.frame, such as:

filter(df, sqrt( (x - 1)^2 + (y - 2)^2 ) < 10)
# Source: local data frame [2 x 2]
#       x     y
#   <int> <int>
# 1    10     1
# 2     3     5

(I'm assuming euclidean distance here.) If you have more dimensions and/or a slightly different distance equation, the application should be straight-forward.

If you are instead interested in the distance between all points in df (as your call to dist implies), then you may need to use which(..., arr.ind = TRUE) and some trickery. Or perhaps do an outer join between these (df) points and other points.

Upvotes: 5

Related Questions