Reputation: 419
In the list, there are three data frames. First, I want to select the data frame of the fewer rows to be my reference frame. And then I want to subset the other data frames based on the minimum distance from the values of the reference data frame. here is the example:
a<- data.frame(name=c("a1","a2","a3","a4"), x=c(10,15,59,21),y=c(12,16,20,30))
b<- data.frame(name=c("b1","b2","b3","b4","b5"), x=c(8,9,2,-1,13),y=c(7,1,5,10,0))
c<- data.frame(name=c("c1","c2","c3","c4","c5","c6","c7"), x=c(1,5,6,2,3,10,-8),y=c(2,-3,7,4,6,15,8))
all<- list(a=a,b=b,c=c)
Here a is chosen as a reference as its nrow=4. now I want to compute the distance as follows
a1b1, a1b2, a1b3, a1b4,a1b5
a2b1, a2b2, a2b3, a2b4,a2b5
a3b1, a3b2, a3b3, a3b4,a3b5
a4b1, a4b2, a4b3, a4b4,a4b5
which distance is minimum of each row the corresponding will be added to the subset of the data frame b called sub_b as follows:
> sub_b
name x y
1 b1 8 7
2 b3 2 5
3 b1 8 7
4 b3 2 5
similarly, compute the distance between a and c then subset c based on the minimum distance
# a1c1, a1c2, a1c3, a1c4,a1c5, a1c6, a1c7
# a2c1, a2c2, a2c3, a2c4,a2c5, a2c6, a2c7
# a3c1, a3c2, a3c3, a3c4,a3c5, a3c6, a3c7
# a4c1, a4c2, a4c3, a4c4,a4c5, a4c6, a4c7
and the sub_c data frame should be as
# Expected Result
> sub_c
name x y
1 c3 6 7
2 c5 3 6
3 c3 6 7
4 c5 3 6
finally, the new list is new.all<- list (a=a, sub_b=sub_b, sub_c=sub_c)
lessRow<- lapply(all, function(x) nrow(x))
lessRow<- which.min(lessRow) # set the reference frame
A<- matrix(a$x, a$y, ,nrow=4,ncol = 2) # convert data frame to matrices
B<- matrix(b$x, b$y, ncol = 2,nrow = 5)
C<- matrix(c$x, c$y, ncol = 2,nrow = 7)
library(geosphere) # compute the distances
dis.ab<- distm(A, B,distGeo)
dis.ac<- distm(A, C,distGeo)
# select which points of dataframe b is closest to points a
minm.ab <- apply(A, 1, function(x) {
dm <- distm(x, B , fun=distGeo)
return(which.min(dm))
})
# select which points of dataframe c is closest to points a
minm.ac<- apply(A, 1, function(x) {
dm <- distm(x, C , fun=distGeo)
return(which.min(dm))
})
# subset based on the minmuim distance
sub_b<- b[minm.ab,]
sub_c<- c[minm.ac, ]
# create a new list of new data frames by keeping the reference frame (a) as it is.
new.all<- list (a=a, sub_b=sub_b, sub_c=sub_c)
The question is how to do so in the loop as the number of data frames is more than 3.
Upvotes: 0
Views: 62
Reputation: 388862
We can separate the reference dataframe and remaining dataframe based on number of rows. Then calculate the distance between each row in reference dataframe with the remaining one and get the minimum distance, use that to subset the rows in dataframe and get a list of dataframes.
library(geosphere)
inds <- which.min(sapply(all, nrow))
ref <- all[[inds]]
remaining <- all[-inds]
output <- lapply(remaining, function(x) {
x[apply(ref[-1], 1, function(y) {
which.min(distm(y, as.matrix(x[-1]), fun = distGeo))
}),]
})
Combined dataframe :
c(list(ref), output)
Upvotes: 3