Thiago
Thiago

Reputation: 672

Most efficient way to multiply a data frame by a vector

What would be the most efficient way to multiply each column of a data frame by a vector?

e.g. data frame (df) has the following columns (col1, col2, col3, col4) and vector (v) has the following elements (v1,v2,v3).

I want the output to be: col2*v1, col3*v2, col4*v3

I've been trying df[c(2:4)] * c(v1,v2,v3) but it seems like the elements of the vector are not multiplying every single row of each column.

Upvotes: 2

Views: 2656

Answers (4)

rnso
rnso

Reputation: 24535

Simple 'apply' function can also be used here, reading by rows:

df[-1]= (t(apply(df[-1],1, FUN=function(x)x*v)))
df
  a  x  y   z
1 a  5 40 105
2 b 10 50 120
3 c 15 60 135

Upvotes: 1

josliber
josliber

Reputation: 44320

You could try (using df and v from Richard Scriven's answer):

df[-1] <- t(t(df[-1]) * v)
df
#   a  x  y   z
# 1 a  5 40 105
# 2 b 10 50 120
# 3 c 15 60 135

When you multiply a matrix by a vector, it multiplies columnwise. Since you want to multiply your rows by the vector, we transpose df[-1] using t, multiply by v, and transpose back using t.

It seems like this approach has a slight edge in benchmarking over the Map approach, and a significant advantage over sweep:

library(microbenchmark)
rscriven <- function(df, v) cbind(df[1], Map(`*`, df[-1], v))
josilber <- function(df, v) cbind(df[1], t(t(df[-1]) * v))
dardisco <- function(df, v) cbind(df[1], sweep(df[-1], MARGIN=2, STATS=v, FUN="*"))
df2 <- cbind(data.frame(rep("a", 1000)), matrix(rnorm(100000), nrow=1000))
v2 <- rnorm(100)
all.equal(rscriven(df2, v2), josilber(df2, v2))
# [1] TRUE
all.equal(rscriven(df2, v2), dardisco(df2, v2))
# [1] TRUE

microbenchmark(rscriven(df2, v2), josilber(df2, v2), dardisco(df2, v2))
# Unit: milliseconds
#               expr       min        lq    median        uq        max neval
#  rscriven(df2, v2)  5.276458  5.378436  5.451041  5.587644   9.470207   100
#  josilber(df2, v2)  2.545144  2.753363  3.099589  3.704077   8.955193   100
#  dardisco(df2, v2) 11.647147 12.761184 14.196678 16.581004 132.428972   100

Thanks to @thelatemail for pointing out that the Map approach is a good deal faster for 100x larger data frames:

df2 <- cbind(data.frame(rep("a", 10000)), matrix(rnorm(10000000), nrow=10000))
v2 <- rnorm(1000)
microbenchmark(rscriven(df2, v2), josilber(df2, v2), dardisco(df2, v2))
# Unit: milliseconds
#               expr       min         lq     median        uq       max neval
#  rscriven(df2, v2)  75.74051   90.20161   97.08931  115.7789  259.0855   100
#  josilber(df2, v2) 340.72774  388.17046  498.26836  514.5923  623.4020   100
#  dardisco(df2, v2) 928.81128 1041.34497 1156.39293 1271.4758 1506.0348   100

It seems like you'll need to benchmark to determine which approach is fastest for your application.

Upvotes: 5

dardisco
dardisco

Reputation: 5274

Not as fast, but more flexible:

sweep(df[-1], MARGIN=2, STATS=v, FUN="*")

Upvotes: 2

Rich Scriven
Rich Scriven

Reputation: 99331

You can use Map for this. Here's an example

> ( df <- data.frame(a = letters[1:3], x = 1:3, y = 4:6, z = 7:9) )
#   a x y z
# 1 a 1 4 7
# 2 b 2 5 8
# 3 c 3 6 9    
> v <- c(5, 10, 15)
> cbind(df[1], Map(`*`, df[-1], v))
#   a  x  y   z
# 1 a  5 40 105
# 2 b 10 50 120
# 3 c 15 60 135

In this example,

  • column x is multiplied by v[1] (5)
  • column y is multiplied by v[2] (10)
  • column z is multiplied by v[3] (15)
  • cbind is used to attach the unused column a to the columns we operated on

Upvotes: 3

Related Questions