Error when running R function with rpy2

Question

I'm trying to use rpy2 to run the multi.split function from the questionr package.

this is my code

from rpy2 import robjects
from rpy2.robjects.packages import importr

questionr = importr(str('questionr'))

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

after the last line I'm getting the following error:

RRuntimeError: Error in `colnames<-`(`*tmp*`, value = c("c(\"red/blue\",_\"green\",_\"red/green\",_\"blue/red\",_\"red/blue\",_\"green\",_.blue",  : 
 'names' attribute [4] must be the same length as the vector [3]

I think that it has something to do with the size of the vector that I'm sending because if I remove the last item

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue"]

and then run

data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

I get no error message. also if I change the "split_char' var, for example:

data_table = multi_split(data_vector, split_char='.')

I get no error message, no matter with size of an array I'm sending.

I have tried to run the matching code directly in R (with R-Studio) it runs with not problems. Any ideas on how can I solve this issue?

lgautier · Accepted Answer

This seems to be because the function multi_split (multi.split in the R package) is trying to use the string representation of the expression associated with the first argument ("data_vector" here).

The signature of the R function is:

multi.split(var, split.char = "/", mnames = NULL)

and the he documentation for mnames is:

names to give to the produced variabels. If NULL, the name are computed from the original variable name and the answers.

In the call multi_split(data_vector, split_char='/') the embedded R cannot see the variable name as this is a Python call and data_vector is an anonymous variable (only content, no variable name).

I though that you could specify mnames, but you checked and this not working (see comments below). That's what the code seems to say: the line vname <- deparse(substitute(var)) is evaluated no matter mnames is specified or not: https://github.com/juba/questionr/blob/9cf09f3ffcd6c8df24182380f12d52b061c221ef/R/table.multi.R#L161

The alternative is to work out the use of an R expression. An older post should provide the necessary bits for that: What object to pass to R from rpy2?

A third possibility is to creatively mix Python-strings-as-R-code:

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
# binding the R vector to a symbol in R's "GlobalEnv"
robjects.globalenv['mydata'] = data_vector
# the call is now in a Python string that is evaluated as R code
data_table = robjects.r("multi.split(data_vector, split_char='/')")

Error when running R function with rpy2

Answers (1)

Related Questions