Fabio Correa
Fabio Correa

Reputation: 1363

How to multiply a single number by a mixed format column in data.table

I have the following data.table, and I wish to multiply column a by column b, a is always a single number and b column may sometimes be a vector:

library(data.table)
tt <- c(33,44)
dt <- data.table(a=list(1,2,3)
                 , b = list(11,22,tt))

dt[, t2 := sapply(b, function(x) x*a)] 

I get an error: Error in x * a : non-numeric argument to binary operator

Because a is always a single number, I expected that row 3 would work, even though b is a vector.

The solution I found is to use mapply:

dt[, t2 := mapply(function(x,y) x*y, a, b)]

Why it does not work with sapply/lapply?

Upvotes: 1

Views: 84

Answers (1)

r2evans
r2evans

Reputation: 160447

Does dt$a[[1]] * dt$b work? (No.) While the first argument is a vector of length 1, the second is not a vector, it is a list, and lists don't do arithmetic. sapply only iterates over one list/vector of values, so while sapply(a, function(AA) AA * b) might seem like a good start, b still reflects a list so cannot be done.

What you are trying to do is multiply a[[1]] with b[[1]], then a[[2]] with b[[2]], etc. That is what Map and mapply do well.

Some things about how they relate.

## equivalent
lapply(lst1, function(z) z + 1)
Map(function(z) z + 1, lst1)

## equivalent
sapply(lst1, function(z) z + 1)
mapply(function(z) z + 1, lst1)

That's it for single-vector processing. But when you want to iterate over multiple (two or more) vectors/lists at the same time, "zipping" them together, there are two options:

stopifnot(length(lst1) == length(lst2))

## equivalent
sapply(seq_along(lst1), function(ind) {
  lst1[[ind]] * lst2[[ind]]
})
mapply(function(o1, o2) o1 * o2, lst1, lst2)
mapply(`*`, lst1, lst2)

Commonalities and differences to know about them:

  • sapply and mapply will try to simplify the return value if possible, so they might return a vector (if the return value is 1), a matrix (if the return value is a vector), or a list (if any length is different from the others). You can force a list with sapply(..., simplify=FALSE) and mapply(..., SIMPLIFY=FALSE) (case difference is important).
  • lapply and Map always return lists, regardless of the above conditions; many find this output consistency more reliable/desirable in a programmatic sense (i.e., in functions/packages).
  • lapply will only return a named list if the vector/list is named, otherwise it is only indexable positionally; all of the others will auto-name the returned list if the input is named or if the input is character. (There might be more rules/exceptions to this, but it's a start.)

Upvotes: 1

Related Questions