Reputation: 716
I'm trying to use rpy2 to run the multi.split function from the questionr package.
this is my code
from rpy2 import robjects
from rpy2.robjects.packages import importr
questionr = importr(str('questionr'))
data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')
after the last line I'm getting the following error:
RRuntimeError: Error in `colnames<-`(`*tmp*`, value = c("c(\"red/blue\",_\"green\",_\"red/green\",_\"blue/red\",_\"red/blue\",_\"green\",_.blue", :
'names' attribute [4] must be the same length as the vector [3]
I think that it has something to do with the size of the vector that I'm sending because if I remove the last item
data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue"]
and then run
data_vector = robjects.StrVector(data)
multi_split = questionr.multi_split
data_table = multi_split(data_vector, split_char='/')
I get no error message. also if I change the "split_char' var, for example:
data_table = multi_split(data_vector, split_char='.')
I get no error message, no matter with size of an array I'm sending.
I have tried to run the matching code directly in R (with R-Studio) it runs with not problems. Any ideas on how can I solve this issue?
Upvotes: 0
Views: 926
Reputation: 11545
This seems to be because the function multi_split
(multi.split
in the R package) is trying to use the string representation of the expression associated with the first argument ("data_vector"
here).
The signature of the R function is:
multi.split(var, split.char = "/", mnames = NULL)
and the he documentation for mnames
is:
names to give to the produced variabels. If NULL, the name are computed from the original variable name and the answers.
In the call multi_split(data_vector, split_char='/')
the embedded R cannot see the variable name as this is a Python call and data_vector
is an anonymous variable (only content, no variable name).
I though that you could specify mnames
, but you checked and this not working (see comments below). That's what the code seems to say: the line vname <- deparse(substitute(var))
is evaluated no matter mnames is specified or not: https://github.com/juba/questionr/blob/9cf09f3ffcd6c8df24182380f12d52b061c221ef/R/table.multi.R#L161
The alternative is to work out the use of an R expression. An older post should provide the necessary bits for that: What object to pass to R from rpy2?
A third possibility is to creatively mix Python-strings-as-R-code:
data = ["red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green", "red/green", "blue/red", "red/blue", "green"]
data_vector = robjects.StrVector(data)
# binding the R vector to a symbol in R's "GlobalEnv"
robjects.globalenv['mydata'] = data_vector
# the call is now in a Python string that is evaluated as R code
data_table = robjects.r("multi.split(data_vector, split_char='/')")
Upvotes: 1