bob
bob

Reputation: 611

recursive error in dplyr mutate

Just learning dplyr (and R) and I do not understand why this fails or what the correct approach to this is. I am looking for a general explanation rather than something specific to this contrived dataset.

Assume I have 3 files sizes with multipliers and I'd like to combine them into a single numeric column.

require(dplyr)

m <- data.frame(
    K = 1E3, 
    M = 1E6, 
    G = 1E9
)

s <- data.frame(
    size = 1:3,
    mult = c('K', 'M', 'G')
)

Now I want to multiply the size by it's multiplier so I tried:

mutate(s, total = size * m[[mult]])

#Error in .subset2(x, i, exact = exact) : 
#    recursive indexing failed at level 2 

which throws an error. I also tried:

mutate(s, total = size * as.numeric(m[mult]))

#1    1    K 1e+06
#2    2    M 2e+09
#3    3    G 3e+03

which is worse than an error (wrong answer)!

I tried a lot of other permutations but could not find the answer.

Thanks in advance!


Edit:
(or should this be another question)
akrun's answer worked great and I thought I understood but if I

rbind(s, c(4, NA))

then update the mutate to

mutate(s, total = size * 
    ifelse(is.na(mult), 1,
        unlist(m[as.character(mult)])

it falls apart again with an "undefined columns selected"

Upvotes: 3

Views: 851

Answers (2)

akrun
akrun

Reputation: 887118

The 'mult' column is 'factor' class. Convert it to 'character' for subsetting the 'm', `unlist' and then multiply with 'size'

 mutate(s, new= size*unlist(m[as.character(mult)]))
 #  size mult   new
 #1    1    K 1e+03
 #2    2    M 2e+06
 #3    3    G 3e+09

If we look at how the 'factor' columns act based on the 'levels'

 m[s$mult]
 #    M     G    K
 #1 1e+06 1e+09 1000

We get the same order of output by using match between the names(m) and levels(s$mult)

  m[match(names(m), levels(s$mult))]
  #    M     G    K
  #1 1e+06 1e+09 1000

So, this might be the reason why you got a different result

Upvotes: 3

Akhil Nair
Akhil Nair

Reputation: 3274

If you don't mind changing the data structure of m, you could use

# change m to a table
m = as.data.frame(t(m))
m$mult = rownames(m)
colnames(m)[which(colnames(m) == "V1")] = "value"

# to avoid indexing
s %>% 
  inner_join(m) %>% 
  mutate(total = size*value) %>% 
  select(size, mult, total)

to keep things more dplyr based.

EDIT: Though it works, you may need to be a little bit careful about the data types in the columns though

Upvotes: 0

Related Questions