shlomiLan
shlomiLan

Reputation: 716

Error when running R function with rpy2

I'm trying to use rpy2 to run the multi.split function from the questionr package.

this is my code

from rpy2 import robjects
from rpy2.robjects.packages import importr

questionr = importr(str('questionr'))

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

after the last line I'm getting the following error:

RRuntimeError: Error in `colnames<-`(`*tmp*`, value = c("c(\"red/blue\",_\"green\",_\"red/green\",_\"blue/red\",_\"red/blue\",_\"green\",_.blue",  : 
 'names' attribute [4] must be the same length as the vector [3]

I think that it has something to do with the size of the vector that I'm sending because if I remove the last item

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue"]

and then run

data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')

I get no error message. also if I change the "split_char' var, for example:

data_table = multi_split(data_vector, split_char='.')

I get no error message, no matter with size of an array I'm sending.

I have tried to run the matching code directly in R (with R-Studio) it runs with not problems. Any ideas on how can I solve this issue?

Upvotes: 0

Views: 926

Answers (1)

lgautier
lgautier

Reputation: 11545

This seems to be because the function multi_split (multi.split in the R package) is trying to use the string representation of the expression associated with the first argument ("data_vector" here).

The signature of the R function is:

multi.split(var, split.char = "/", mnames = NULL)

and the he documentation for mnames is:

names to give to the produced variabels. If NULL, the name are computed from the original variable name and the answers.

In the call multi_split(data_vector, split_char='/') the embedded R cannot see the variable name as this is a Python call and data_vector is an anonymous variable (only content, no variable name).

I though that you could specify mnames, but you checked and this not working (see comments below). That's what the code seems to say: the line vname <- deparse(substitute(var)) is evaluated no matter mnames is specified or not: https://github.com/juba/questionr/blob/9cf09f3ffcd6c8df24182380f12d52b061c221ef/R/table.multi.R#L161

The alternative is to work out the use of an R expression. An older post should provide the necessary bits for that: What object to pass to R from rpy2?

A third possibility is to creatively mix Python-strings-as-R-code:

data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
# binding the R vector to a symbol in R's "GlobalEnv"
robjects.globalenv['mydata'] = data_vector
# the call is now in a Python string that is evaluated as R code
data_table = robjects.r("multi.split(data_vector, split_char='/')")

Upvotes: 1

Related Questions