Reputation: 51
I have two different dataframes with a different number of rows. I need to apply a set of functions to each possible combination of rows with one row coming from 1st dataframe and other from 2nd dataframe. Though I am able to perform this task using for loops, I feel that there must be a more efficient way to do it. An example case is given below. D1 and D2 are two dataframes. I need to evaluate D3 with one column as the Euclidean distance in the x-y plane and second column as squared difference of z values, of each row pair from D1 and D2.
D1<-data.frame(x=1:5,y=6:10,z=rnorm(5))
D2<-data.frame(x=19:30,y=41:52,z=rnorm(12))
D3<-data.frame(distance=integer(0),difference=integer(0))
for (i in 1:nrow(D1)){
for (j in 1:nrow(D2)) {
temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
D3<-rbind(D3,temp)
}
}
Thank you
Upvotes: 1
Views: 651
Reputation: 42544
There is also a data.table
solution:
library(data.table)
setDT(D1)[, rn := .I]
setDT(D2)[, rn := .I]
D1[D2[CJ(D1$rn, D2$rn), on = .(rn == V2)], on = .(rn == V1)][
, .(distance = sqrt((x - i.x)^2 + (y -i.y)^2),
difference = (z - i.z)^2)]
Upvotes: 0
Reputation: 1054
You can create a separate function to compute the metrics accordingly the indexes of each data.frame, here i call them i_D1
and i_D2
.
# create function to compute the euclidean distance and z-difference
get_D3_values <- function(i_D1, i_D2){
dist_x <- D1[i_D1, "x"] - D2[i_D2, "x"]
dist_y <- D1[i_D1, "y"] - D2[i_D2, "y"]
distance <- sqrt(dist_x^2 + dist_y^2)
difference <- (D1[i_D1, "z"] - D2[i_D2, "z"])^2
return(
list("i_D1"=i_D1, "i_D2"=i_D2,
"distance"=distance, "difference"=difference)
)
}
After, create a matrix which combine all index variables of D1
and D2
with expand.grid
.
D1 <- data.frame(x=1:5, y=6:10, z=rnorm(5))
D2 <- data.frame(x=19:30, y=41:52, z=rnorm(12))
# create a data table with all combinations between rows of D1 and D2
row_comb <- expand.grid("row_D1"=seq(nrow(D1)), "row_D2"=seq(nrow(D2)))
head(row_comb)
# row_D1 row_D2
#1 1 1
#2 2 1
#3 3 1
#4 4 1
#5 5 1
#6 1 2
So, apply mapply
to iterate the function over all rows of row_comb
.
result <- with(row_comb,
mapply(FUN=get_D3_values, i_D1=row_D1, i_D2=row_D2, USE.NAMES=TRUE))
result <- data.frame(t(result))
head(result)
# i_D1 i_D2 distance difference
#1 1 1 39.35734 0.08479992
#2 2 1 38.01316 1.155829
#3 3 1 36.67424 2.858793
#4 4 1 35.34119 0.8642712
#5 5 1 34.0147 0.3030355
#6 1 2 40.70626 2.657727
Upvotes: 0
Reputation: 540
Merge the two dataframes to get all unique combinations by-
D3<-merge(D1,D2,by=c())
result<-data.frame(distance=integer(0),difference=integer(0))
Then use purrr::map to apply the same distance/difference calculator function across all rows in your dataframe D3
resdistance<-data.frame(purrr::map(1:nrow(D3),function(ind) { distance=sqrt(sum((D3[ind,]['x.x']-D3[ind,]['x.y'])^2,(D3[ind,]['y.x']-D3[ind,]['y.y'])^2)) }))
resdifference<-data.frame(purrr::map(1:nrow(D3),function(ind) { difference=(D3[ind,]['z.x']-D3[ind,]['z.y'])^2 }))
You can then merge the two dataframes to get your desired result
result<-rbind(result,cbind(resdistance,resdifference))
Upvotes: 1