Reputation: 69251
What is the programmatic way to operate on multiple columns (or rows) at a time? Consider the example:
df2 <- data.frame(a = 1:10, b = 5:14, c = 10:1, d = 14:5)
I'd like to create two new columns that contain the ratio of a/b and c/d. For simplicity sake, let's assume that the spatial relationship between all of the columns is consistent. With only two columns to make, one might as well do something like this:
df2$ab <- with(df2, a/b)
df2$cd <- with(df2, c/d)
or
df2 <- transform(df2, ab = a/b, cd = c/d)
To create a final product like:
a b c d ab cd
1 1 5 10 14 0.2000000 0.7142857
2 2 6 9 13 0.3333333 0.6923077
3 3 7 8 12 0.4285714 0.6666667
4 4 8 7 11 0.5000000 0.6363636
5 5 9 6 10 0.5555556 0.6000000
6 6 10 5 9 0.6000000 0.5555556
7 7 11 4 8 0.6363636 0.5000000
8 8 12 3 7 0.6666667 0.4285714
9 9 13 2 6 0.6923077 0.3333333
10 10 14 1 5 0.7142857 0.2000000
But what if there are 100s of columns that you need to iterate through? I'd normally use Excel or SQL to concatenate the code and drop it in my .R script, but it seems there must be a way to solve this with R.
Full disclosure: this question is related to a question on cross validated where I hacked together a solution that cannot be optimal.
Upvotes: 5
Views: 5016
Reputation: 69251
How about this solution? It avoids cbind
ing the object to itself multiple times:
for (i in seq(1, ncol(df2), by = 2)) {
df2[, paste(names(df2)[i], names(df2)[i+1], sep = "", collapse = "")] <- df2[, i] / df2[, i + 1]
}
EDIT: It looks like my solution is marginally faster than @djhurio's
> set.seed(42)
> df2 <- data.frame(
+ a = sample(1:10, 10e6, T)
+ , b = sample(1:10, 10e6, T)
+ , c = sample(1:10, 10e6, T)
+ , d = sample(1:10, 10e6, T)
+ )
>
> system.time(
+ for (i in seq(1, ncol(df2), by = 2)) {
+ df2[, paste(names(df2)[i], names(df2)[i+1], sep = "", collapse = "")] <- df2[, i] / df2[, i + 1]
+ }
+ )
user system elapsed
1.06 0.64 1.70
>
> foo <- function(df2) {
+ n <- ncol(df2)
+ s1 <- seq(1, n, 2)
+ s2 <- seq(2, n, 2)
+ df2 <- cbind(df2, df2[, s1] / df2[, s2])
+ names(df2)[(n+1):ncol(df2)] <- paste(names(df2)[s1], names(df2)[s2], sep="")
+ return(df2)
+ }
>
> set.seed(42)
> df2 <- data.frame(
+ a = sample(1:10, 10e6, T)
+ , b = sample(1:10, 10e6, T)
+ , c = sample(1:10, 10e6, T)
+ , d = sample(1:10, 10e6, T)
+ )
> system.time(foo(df2))
user system elapsed
1.73 0.85 2.57
Upvotes: 0
Reputation: 5536
Try this solution:
df2 <- data.frame(a = 1:10, b = 5:14, c = 10:1, d = 14:5, e = 34:43, f = 56:65)
n <- ncol(df2)
s1 <- seq(1, n, 2)
s2 <- seq(2, n, 2)
df2[paste(names(df2)[s1], names(df2)[s2], sep="")] <- df2[, s1] / df2[, s2]
Upvotes: 5