Gongon
Gongon

Reputation: 505

How to calculate the difference between different data frames with common column names

I have three data frames and trying to calculate the difference between two data frames (Df2 and Df3) conditioned by data frame 1. As explained in following example I have three data frames, Df1, Df2 and Df3 with common names. In first step, in Df1, I want to compare the values of “standard” column with all three columns, “Das”,”Dss” and ”Tri” probably row wise and where ever any value of these columns, “Das”, “Dss” and “Tri” is higher than the “Standard” in Df1, calculate the difference of same position in Df2 and Df3 and put the difference in a separate column.

Df1             
    Names   Standard    Das Dss Tri
    Aa  3   3   6   2
    Ab  4   6   4   3
    Ac  2   5   2   4
    Ad  4   3   3   8
    Ae  6   4   5   7
    Af  4   5   7   5
    Ag  2   6   8   2
    Ah  9   7   6   2

Df2         
    Names   Das Dss Tri
    Aa  4   2   5
    Ab  7   5   4
    Ac  5   7   2
    Ad  6   4   3
    Ae  5   3   5
    Af  3   2   6
    Ag  2   5   4
    Ah  4   6   3

Df3

Names   Das Dss Tri
    Aa  5   3   5
    Ab  8   5   4
    Ac  6   7   2
    Ad  6   4   7
    Ae  5   3   8
    Af  4   5   6
    Ag  1   5   4
    Ah  4   6   3

Final Ouput

Df3             
    Names   Das Dss Tri Difference
    Aa  5   3   5   -1
    Ab  8   5   4   -1
    Ac  6   7   2   -1
    Ad  6   4   7   -4
    Ae  5   3   8   -3
    Af  4   5   6   -4
    Ag  1   5   4   1
    Ah  4   6   3   0

Upvotes: 4

Views: 1561

Answers (2)

Arun
Arun

Reputation: 118889

Here's the script that takes the index of the first biggest value if more than 1 value is found and if no values are found, NA is returned.

df1 <- structure(list(standard = c(3, 4, 2, 4, 6, 4, 2, 9), das = c(3, 
6, 5, 3, 4, 5, 6, 7), dss = c(6, 4, 2, 3, 5, 7, 8, 6), tri = c(2, 
3, 4, 8, 7, 5, 2, 2)), .Names = c("standard", "das", "dss", "tri"
), row.names = c(NA, -8L), class = "data.frame")

df2 <- structure(list(das = c(4, 7, 5, 6, 5, 3, 2, 4), dss = c(2, 
5, 7, 4, 3, 2, 5, 6), tri = c(5,4,2,3,5,6,4,3)), .Names = c("das", "dss", "tri"
), row.names = c(NA, -8L), class = "data.frame")

df3 <- structure(list(das = c(5, 8, 6, 6, 5, 4, 1, 4), dss = c(3, 
     5, 7, 4, 3, 5, 5, 6), tri = c(5,4,2,7,8,6,4,3)), .Names = c("das", "dss", "tri"
 ), row.names = c(NA, -8L), class = "data.frame")

# get indices. run through every row of df1
# and get the maximum column index > standard
idx.v <- sapply( 1:nrow(df1), function(idx) {
    t <- which(df1[idx, 2:4] > df1[idx, 1])
})

df3$result <- sapply(1:length(idx.v), function(ix) {
    col.idx <- idx.v[[ix]]
    len.idx <- length(col.idx)
    if (len.idx > 0) {
        res <- sum(df2[ix, col.idx] - df3[ix, col.idx])
    } else {
        res <- NA
    }
})

Output:

> df3
  das dss tri result
1   5   3   5     -1
2   8   5   4     -1
3   6   7   2     -1
4   6   4   7     -4
5   5   3   8     -3
6   4   5   6     -4
7   1   5   4      1
8   4   6   3     NA

Thanks for the chat. This is what you require.

Upvotes: 1

Matthew Lundberg
Matthew Lundberg

Reputation: 42689

I think this is the correct result, but note that the seventh value differs. Using the max value of the three columns (an easier task) produces a result that differs in even more slots.

df1.w <- sapply( seq(1, nrow(df1)), 
                 function(idx) min(c(Inf, which(df1[-(1:2)][idx,] > df1[idx, 2])))
                )

df1.mat <- matrix(c(seq(1, nrow(df1)), df1.w), ncol=2)
df1.mat[is.infinite(df1.mat)] <- 1

ifelse(is.infinite(df1.w), 0, 
       df2[-1][df1.mat] - df3[-1][df1.mat]
       )

## [1] -1 -1 -1 -4 -3 -1  1  0

If you actually do want to use the index of the max value in df1[-(1:2)], replace the definition of df1.w (the sapply call) with this:

df1.w <- apply(df1[-(1:2)], 1, which.max)

Using the rest of the code above then gives this result:

## [1] -1 -1 -1 -4 -3 -3  0  0

Upvotes: 1

Related Questions