gaspers
gaspers

Reputation: 77

Applying function on iterative combinations of columns in data frame in R

I want to build a particular function, which sums rows of different combinations of columns in whole data frame.

Example - I have a data frame with 1st descriptive column and 20 columns (v1, v2, v3, ...) from 1 to 20, where only possible values are 1 or 0. I want a function, where I can sum values from columns based on specific occurrence. If per example function would be set on 3, then I need that first 3 columns (v1, v2 and v3) would be summed together in first newly created variable (v1s), then next 3 columns would be summed together (v2, v3, v4) in next newly created variable (v2s), etc.

Reproducible example:

set.seed(300) 
db <- matrix(sample(0:1,5*20, replace=TRUE),5,20)
us <- c("A","B","C","D","E")
db <- as.data.frame(cbind(us,db))

So I want to be in control to defining how many columns do I want summed up (by in control I mean it shouldn't be hard to change this to 5 columns per example). In case of 3 columns, I want to create 18 additional vars (all continuing combos of 3 variables from 1st to last) containing row sums of 3 columns.

I tried few options, however I can't really figure out easy way to do it. I was thinking of a function, which takes as input number of columns and then rowsums up values and iterates over all combos, however I can't manage to crack this one. Please help!

Upvotes: 0

Views: 334

Answers (2)

Uwe
Uwe

Reputation: 42544

If I understand the question correctly the result can be obtained by reshaping the data from wide to long format, summing over a rolling window for each group id us and by reshaping back to wide format:

library(data.table)
# define number of columns to sum over
width <- 3L
# reshape from wide to long format
melt(setDT(db), id = "us")[
  # make sure that column values can be added
  , value := as.integer(value)][
    # sum across a rolling window for each group
    , Reduce("+", shift(value, n = 0:(width - 1L), type = "lead")), by = us][
      # remove rows from incomplete window sizes
      !is.na(V1)][
        # reshape to wide format again
        , dcast(.SD, us ~ sprintf("S%02i", rowid(us)))]

which returns

   us S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 S16 S17 S18
1:  A   2   2   3   2   2   1   1   1   1   2   1   2   2   2   2   1   2   1
2:  B   2   2   1   2   2   2   1   1   2   2   2   1   2   1   1   1   1   2
3:  C   2   2   2   2   2   3   3   2   2   2   3   3   2   2   1   2   2   2
4:  D   2   2   2   2   1   2   2   2   2   2   2   2   1   1   1   2   3   3
5:  E   3   3   2   1   0   1   1   2   2   2   1   1   2   3   2   1   1   1

For n <- 5L we get

   us S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 S16
1:  A   4   3   4   3   2   2   2   2   2   3   3   3   3   3   3   2
2:  B   3   3   3   3   2   3   3   2   3   3   3   2   2   2   2   2
3:  C   3   3   4   4   4   4   4   4   4   4   4   4   3   3   3   3
4:  D   3   3   3   3   3   3   3   4   3   3   3   2   2   3   3   4
5:  E   4   3   2   2   1   2   3   3   2   3   3   3   3   3   3   2

As prerequisite, db must not contain any NA values.

Upvotes: 1

Martin Schmelzer
Martin Schmelzer

Reputation: 23889

There might be an easier way. Anyhow, here is my approach:

Code:

grp_colsum <- function(df, n) {
  # generate the set of indices 123, 234, 345, ...
  idx <- sapply(1:n, `+`, 0:(ncol(df)-n))
  # for each set of indices
  res <- apply(idx, 1, function(x) {
    rowSums(df[,x])
  })
  colnames(res) <- paste0("v", 1:ncol(res), "s")
  res
}
grp_colsum(db[,-1], 3)

Output:

    v1s v2s v3s v4s v5s v6s v7s v8s v9s v10s v11s v12s v13s v14s v15s v16s v17s v18s
[1,]   2   2   3   2   2   1   1   1   1    2    1    2    2    2    2    1    2    1
[2,]   2   2   1   2   2   2   1   1   2    2    2    1    2    1    1    1    1    2
[3,]   2   2   2   2   2   3   3   2   2    2    3    3    2    2    1    2    2    2
[4,]   2   2   2   2   1   2   2   2   2    2    2    2    1    1    1    2    3    3
[5,]   3   3   2   1   0   1   1   2   2    2    1    1    2    3    2    1    1    1

Upvotes: 2

Related Questions