tob
tob

Reputation: 95

How can I find difference between value of rows with same colnames / rownames?

I would like to find the difference between each set of 2 combinations, grouped by column A.

The input data:

 A        B
11   320836
11  5719750
 6 29911154
 6 29912280
 6 29912285    

Below is the expected output:

 A        B          C  Difference
11   320836    5719750     5398914
 6 29911154   29912280        1126
 6 29911154   29912285        1131
 6 29912280   29912285           5

Upvotes: 0

Views: 132

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99321

Here's a possibility with the data.table package.

library(data.table)

We can quickly calculate the difference by using diff() inside combn(), grouped by A.

setDT(df)[, combn(B, 2, diff), by = A]
#     A      V1
# 1: 11 5398914
# 2:  6    1126
# 3:  6    1131
# 4:  6       5

To get all your required columns, we can do a bit more work. The combn() function can be used to get the combinations of two elements. Then we can create a named list for the three new columns from the result of combn(). All this is grouped by A.

setDT(df)[, {
    cmb <- combn(B, 2)
    .(B = cmb[1, ], C = cmb[2, ], Diff = cmb[2, ] - cmb[1, ])
}, by = A]
#     A        B        C    Diff
# 1: 11   320836  5719750 5398914
# 2:  6 29911154 29912280    1126
# 3:  6 29911154 29912285    1131
# 4:  6 29912280 29912285       5

Upvotes: 5

Wyldsoul
Wyldsoul

Reputation: 1553

Here is a dplyr option using combn:

   df <- read.table(textConnection("
                            A   B
                            11 320836
                            11 5719750
                            6 29911154
                            6 29912280
                            6 29912285  "),header=TRUE)

 library(dplyr)
 df2 <- 
 as.data.frame(df %>% 
 group_by(A) %>% 
 do(as.data.frame(t(combn(.[["B"]], 2)))))
 df2$diff <- df2$V2-df2$V1 

Upvotes: 0

M. Siwik
M. Siwik

Reputation: 497

#rm(list = ls())

A = c(11,11,6,6,6)
B = c(320836, 5719750, 29911154, 29912280, 29912285)

data <- cbind(A, B)


library("dplyr")

data <- as.data.frame(data)
output <- c(1,2,3)

for (i in unique(data$A)) {

  numA <- i
  a = unique(data[data$A == i,2])
  temp <- expand.grid(a,a) 
  temp$A <- i
  temp <- arrange(temp, Var1)
  output <- rbind(output, temp)
}
output <- output[-1,] # removing 1st row you dont need.
output$diff <- output$Var1 - output$Var2

Analize this answer. I dont delate symetric duplicate rows. But i think that this idea will help you.

       Var1     Var2  A     diff
2    320836   320836 11        0
3    320836  5719750 11 -5398914
4   5719750   320836 11  5398914
5   5719750  5719750 11        0
6  29911154 29911154  6        0
7  29911154 29912280  6    -1126
8  29911154 29912285  6    -1131
9  29912280 29911154  6     1126
10 29912280 29912280  6        0
11 29912280 29912285  6       -5
12 29912285 29911154  6     1131
13 29912285 29912280  6        5
14 29912285 29912285  6        0

Upvotes: 0

Related Questions