Reputation: 672
What would be the most efficient way to multiply each column of a data frame by a vector?
e.g. data frame (df
) has the following columns (col1, col2, col3, col4
) and vector (v
) has the following elements (v1,v2,v3
).
I want the output to be: col2*v1, col3*v2, col4*v3
I've been trying df[c(2:4)] * c(v1,v2,v3)
but it seems like the elements of the vector are not multiplying every single row of each column.
Upvotes: 2
Views: 2656
Reputation: 24535
Simple 'apply' function can also be used here, reading by rows:
df[-1]= (t(apply(df[-1],1, FUN=function(x)x*v)))
df
a x y z
1 a 5 40 105
2 b 10 50 120
3 c 15 60 135
Upvotes: 1
Reputation: 44320
You could try (using df
and v
from Richard Scriven's answer):
df[-1] <- t(t(df[-1]) * v)
df
# a x y z
# 1 a 5 40 105
# 2 b 10 50 120
# 3 c 15 60 135
When you multiply a matrix by a vector, it multiplies columnwise. Since you want to multiply your rows by the vector, we transpose df[-1]
using t
, multiply by v
, and transpose back using t
.
It seems like this approach has a slight edge in benchmarking over the Map
approach, and a significant advantage over sweep
:
library(microbenchmark)
rscriven <- function(df, v) cbind(df[1], Map(`*`, df[-1], v))
josilber <- function(df, v) cbind(df[1], t(t(df[-1]) * v))
dardisco <- function(df, v) cbind(df[1], sweep(df[-1], MARGIN=2, STATS=v, FUN="*"))
df2 <- cbind(data.frame(rep("a", 1000)), matrix(rnorm(100000), nrow=1000))
v2 <- rnorm(100)
all.equal(rscriven(df2, v2), josilber(df2, v2))
# [1] TRUE
all.equal(rscriven(df2, v2), dardisco(df2, v2))
# [1] TRUE
microbenchmark(rscriven(df2, v2), josilber(df2, v2), dardisco(df2, v2))
# Unit: milliseconds
# expr min lq median uq max neval
# rscriven(df2, v2) 5.276458 5.378436 5.451041 5.587644 9.470207 100
# josilber(df2, v2) 2.545144 2.753363 3.099589 3.704077 8.955193 100
# dardisco(df2, v2) 11.647147 12.761184 14.196678 16.581004 132.428972 100
Thanks to @thelatemail for pointing out that the Map
approach is a good deal faster for 100x larger data frames:
df2 <- cbind(data.frame(rep("a", 10000)), matrix(rnorm(10000000), nrow=10000))
v2 <- rnorm(1000)
microbenchmark(rscriven(df2, v2), josilber(df2, v2), dardisco(df2, v2))
# Unit: milliseconds
# expr min lq median uq max neval
# rscriven(df2, v2) 75.74051 90.20161 97.08931 115.7789 259.0855 100
# josilber(df2, v2) 340.72774 388.17046 498.26836 514.5923 623.4020 100
# dardisco(df2, v2) 928.81128 1041.34497 1156.39293 1271.4758 1506.0348 100
It seems like you'll need to benchmark to determine which approach is fastest for your application.
Upvotes: 5
Reputation: 5274
Not as fast, but more flexible:
sweep(df[-1], MARGIN=2, STATS=v, FUN="*")
Upvotes: 2
Reputation: 99331
You can use Map
for this. Here's an example
> ( df <- data.frame(a = letters[1:3], x = 1:3, y = 4:6, z = 7:9) )
# a x y z
# 1 a 1 4 7
# 2 b 2 5 8
# 3 c 3 6 9
> v <- c(5, 10, 15)
> cbind(df[1], Map(`*`, df[-1], v))
# a x y z
# 1 a 5 40 105
# 2 b 10 50 120
# 3 c 15 60 135
In this example,
x
is multiplied by v[1]
(5)y
is multiplied by v[2]
(10)z
is multiplied by v[3]
(15)cbind
is used to attach the unused column a
to the columns we operated onUpvotes: 3