otterdog2000
otterdog2000

Reputation: 363

Subtracting column data in all pair permutations R

I have hourly price data in dataframe in which I need to subtract all permutations to find the best pairing for a financial trade. Each column (not including pricedate, hour) can be considered the closing price of that stock for that particular pricedate and hour. This is the data:

test <- data.frame(pricedate = as.Date('2019-12-18'), hour = c(1,2,3,4,5), A = c(3,5,6,4,2), B = c(5,3,2,6,7), C = c(1,2,3,6,9))

I want to get a new dataframe (or table) of the difference between all permutation combinations. Therefore, "A subtract B" is different than "B subtract A". And I don't need a column subtracted from itself. The resulting table would look something like this:


Pricedate      Hour        A-B    A-C   B-A    B-C   C-A   C-B 

2019-12-18      1          -2      2     2      4     -2    -4
2019-12-18      2           2      3    -2      1     -3    -1
.
.
.

I believe I need the data to stay in this form because I'd like to calculate some financial statistics in R after this.

Upvotes: 3

Views: 1114

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 388982

Another base R approach using combn. Since B-A is -(A-B) we can use combn to calculate difference between every 2 values and get the other combination by negating the calculated values.

cols <- combn(names(test)[3:5], 2, paste, collapse = "-")
cols <- c(cols, sub("(.)-(.)", "\\2-\\1", cols))

test[cols] <- t(apply(test[3:5], 1, function(x) {
     out <- combn(x, 2, function(x) x[1] - x[2])
     c(out, -out)
}))

test
#   pricedate hour A B C A-B A-C B-C B-A C-A C-B
#1 2019-12-18    1 3 5 1  -2   2   4   2  -2  -4
#2 2019-12-18    2 5 3 2   2   3   1  -2  -3  -1
#3 2019-12-18    3 6 2 3   4   3  -1  -4  -3   1
#4 2019-12-18    4 4 6 6  -2  -2   0   2   2   0
#5 2019-12-18    5 2 7 9  -5  -7  -2   5   7   2

Upvotes: 2

r2evans
r2evans

Reputation: 160447

I know you listed dplyr and tidy as tags, but here's a base approach that can easily be used within a tidy pipe:

somefunc <- function(x) {
  as.data.frame(t(apply(x, 1, function(z) {
    df <- as.data.frame.table(outer(z, z, `-`))
    df <- df[ df[[1]] != df[[2]], ]
    setNames(df[[3]], paste(df[[1]], df[[2]], sep = "_"))
  })))
}
somefunc(test[3:5])
#   B_A C_A A_B C_B A_C B_C
# 1   2  -2  -2  -4   2   4
# 2  -2  -3   2  -1   3   1
# 3  -4  -3   4   1   3  -1
# 4   2   2  -2   0  -2   0
# 5   5   7  -5   2  -7  -2

Upvotes: 4

Jon Spring
Jon Spring

Reputation: 66500

Here's a tidyverse approach. First, we convert to longer form, where each column A:C is represented in a new row, and the designation of which column it came from goes in a new column called "col". Then we join that table to itself, so every row is combined with all the rows which share the same date and hour.

Then we calc the difference, filter out the rows where rows are subtracted from themselves, unite the two column headers into a single identifying column, and convert back to wide format.

library(tidyverse)

test_longer <- test %>%
  pivot_longer(A:C, names_to = "col", values_to = "val")

test_longer %>%
  left_join(test_longer, 
            suffix = c("1", "2"),
            by = c("pricedate", "hour")) %>%
  filter(col1 != col2) %>%
  mutate(dif = val1 - val2) %>%
  unite("col", c(col1, col2), sep = "-") %>%
  select(-c(val1, val2)) %>%
  pivot_wider(names_from = col, values_from = dif)


# A tibble: 5 x 8
  pricedate   hour `A-B` `A-C` `B-A` `B-C` `C-A` `C-B`
  <date>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-12-18     1    -2     2     2     4    -2    -4
2 2019-12-18     2     2     3    -2     1    -3    -1
3 2019-12-18     3     4     3    -4    -1    -3     1
4 2019-12-18     4    -2    -2     2     0     2     0
5 2019-12-18     5    -5    -7     5    -2     7     2

Upvotes: 7

Related Questions