Chase
Chase

Reputation: 69251

Operate on multiple columns at once

What is the programmatic way to operate on multiple columns (or rows) at a time? Consider the example:

df2 <- data.frame(a = 1:10, b = 5:14, c = 10:1, d = 14:5)

I'd like to create two new columns that contain the ratio of a/b and c/d. For simplicity sake, let's assume that the spatial relationship between all of the columns is consistent. With only two columns to make, one might as well do something like this:

df2$ab <- with(df2, a/b)
df2$cd <- with(df2, c/d)

or

df2 <- transform(df2, ab = a/b, cd = c/d)

To create a final product like:

    a  b  c  d        ab        cd
1   1  5 10 14 0.2000000 0.7142857
2   2  6  9 13 0.3333333 0.6923077
3   3  7  8 12 0.4285714 0.6666667
4   4  8  7 11 0.5000000 0.6363636
5   5  9  6 10 0.5555556 0.6000000
6   6 10  5  9 0.6000000 0.5555556
7   7 11  4  8 0.6363636 0.5000000
8   8 12  3  7 0.6666667 0.4285714
9   9 13  2  6 0.6923077 0.3333333
10 10 14  1  5 0.7142857 0.2000000

But what if there are 100s of columns that you need to iterate through? I'd normally use Excel or SQL to concatenate the code and drop it in my .R script, but it seems there must be a way to solve this with R.

Full disclosure: this question is related to a question on cross validated where I hacked together a solution that cannot be optimal.

Upvotes: 5

Views: 5016

Answers (2)

Chase
Chase

Reputation: 69251

How about this solution? It avoids cbinding the object to itself multiple times:

for (i in seq(1, ncol(df2), by = 2)) {
    df2[, paste(names(df2)[i], names(df2)[i+1], sep = "", collapse = "")] <- df2[, i] / df2[, i + 1]
}

EDIT: It looks like my solution is marginally faster than @djhurio's

> set.seed(42)
> df2 <- data.frame(
+       a = sample(1:10, 10e6, T)
+ , b = sample(1:10, 10e6, T)
+ , c = sample(1:10, 10e6, T)
+ , d = sample(1:10, 10e6, T)
+ )
> 
> system.time(
+ for (i in seq(1, ncol(df2), by = 2)) {
+ df2[, paste(names(df2)[i], names(df2)[i+1], sep = "", collapse = "")] <- df2[, i] / df2[, i + 1]
+ }
+ )
   user  system elapsed 
   1.06    0.64    1.70 
> 
> foo <- function(df2) {
+ n <- ncol(df2)
+ s1 <- seq(1, n, 2)
+ s2 <- seq(2, n, 2)
+ df2 <- cbind(df2, df2[, s1] / df2[, s2])
+ names(df2)[(n+1):ncol(df2)] <- paste(names(df2)[s1], names(df2)[s2], sep="")
+ return(df2)
+ }
> 
> set.seed(42)
> df2 <- data.frame(
+       a = sample(1:10, 10e6, T)
+ , b = sample(1:10, 10e6, T)
+ , c = sample(1:10, 10e6, T)
+ , d = sample(1:10, 10e6, T)
+ )
> system.time(foo(df2))
   user  system elapsed 
   1.73    0.85    2.57 

Upvotes: 0

djhurio
djhurio

Reputation: 5536

Try this solution:

df2 <- data.frame(a = 1:10, b = 5:14, c = 10:1, d = 14:5, e = 34:43, f = 56:65)

n <- ncol(df2)
s1 <- seq(1, n, 2)
s2 <- seq(2, n, 2)

df2[paste(names(df2)[s1], names(df2)[s2], sep="")] <- df2[, s1] / df2[, s2]

Upvotes: 5

Related Questions