Saubhagya
Saubhagya

Reputation: 51

R- Applying a function to each possible pair of rows from two different data-frames

I have two different dataframes with a different number of rows. I need to apply a set of functions to each possible combination of rows with one row coming from 1st dataframe and other from 2nd dataframe. Though I am able to perform this task using for loops, I feel that there must be a more efficient way to do it. An example case is given below. D1 and D2 are two dataframes. I need to evaluate D3 with one column as the Euclidean distance in the x-y plane and second column as squared difference of z values, of each row pair from D1 and D2.

D1<-data.frame(x=1:5,y=6:10,z=rnorm(5))
D2<-data.frame(x=19:30,y=41:52,z=rnorm(12))
D3<-data.frame(distance=integer(0),difference=integer(0))

for (i in 1:nrow(D1)){

 for (j in 1:nrow(D2))  {

 temp<-data.frame(distance=sqrt(sum((D1[i,1:2]-D2[j,1:2])^2)),difference=(D1[i,3]-D2[j,3])^2)
D3<-rbind(D3,temp)
}
}

Thank you

Upvotes: 1

Views: 651

Answers (3)

Uwe
Uwe

Reputation: 42544

There is also a data.table solution:

library(data.table)
setDT(D1)[, rn := .I]
setDT(D2)[, rn := .I]
D1[D2[CJ(D1$rn, D2$rn), on = .(rn == V2)], on = .(rn == V1)][
  , .(distance = sqrt((x - i.x)^2 + (y -i.y)^2),
      difference = (z - i.z)^2)]

Upvotes: 0

Rafael Toledo
Rafael Toledo

Reputation: 1054

You can create a separate function to compute the metrics accordingly the indexes of each data.frame, here i call them i_D1 and i_D2.

# create function to compute the euclidean distance and z-difference
get_D3_values <- function(i_D1, i_D2){
        dist_x <- D1[i_D1, "x"] - D2[i_D2, "x"]
        dist_y <- D1[i_D1, "y"] - D2[i_D2, "y"]

        distance <- sqrt(dist_x^2 + dist_y^2)

        difference <- (D1[i_D1, "z"] - D2[i_D2, "z"])^2


        return(
                list("i_D1"=i_D1, "i_D2"=i_D2, 
                     "distance"=distance, "difference"=difference)
        )
}

After, create a matrix which combine all index variables of D1 and D2 with expand.grid.

D1 <- data.frame(x=1:5, y=6:10, z=rnorm(5))
D2 <- data.frame(x=19:30, y=41:52, z=rnorm(12))

# create a data table with all combinations between rows of D1 and D2
row_comb <- expand.grid("row_D1"=seq(nrow(D1)), "row_D2"=seq(nrow(D2)))

head(row_comb)

#  row_D1 row_D2
#1      1      1
#2      2      1
#3      3      1
#4      4      1
#5      5      1
#6      1      2

So, apply mapply to iterate the function over all rows of row_comb.

result <- with(row_comb, 
               mapply(FUN=get_D3_values, i_D1=row_D1, i_D2=row_D2, USE.NAMES=TRUE))

result <- data.frame(t(result))

head(result)

#  i_D1 i_D2 distance difference
#1    1    1 39.35734 0.08479992
#2    2    1 38.01316   1.155829
#3    3    1 36.67424   2.858793
#4    4    1 35.34119  0.8642712
#5    5    1  34.0147  0.3030355
#6    1    2 40.70626   2.657727

Upvotes: 0

Raj Padmanabhan
Raj Padmanabhan

Reputation: 540

Merge the two dataframes to get all unique combinations by-

D3<-merge(D1,D2,by=c())
result<-data.frame(distance=integer(0),difference=integer(0))

Then use purrr::map to apply the same distance/difference calculator function across all rows in your dataframe D3

resdistance<-data.frame(purrr::map(1:nrow(D3),function(ind) { distance=sqrt(sum((D3[ind,]['x.x']-D3[ind,]['x.y'])^2,(D3[ind,]['y.x']-D3[ind,]['y.y'])^2)) }))

resdifference<-data.frame(purrr::map(1:nrow(D3),function(ind) { difference=(D3[ind,]['z.x']-D3[ind,]['z.y'])^2 }))

You can then merge the two dataframes to get your desired result

result<-rbind(result,cbind(resdistance,resdifference))

Upvotes: 1

Related Questions