John Stone
John Stone

Reputation: 715

Call R package data using Python with rpy2

I want to use the Auto data from R package library(ISLR) in Python. I do some tests inspired in Introduction to rpy2 as follows:

from rpy2 import robjects
from rpy2.robjects.packages import importr, data
from rpy2.robjects import pandas2ri
pandas2ri.activate()

datasets = importr('datasets') # data(mtcars) in library(datasets)
mtcars = data(datasets).fetch('mtcars')['mtcars']

ISLR = importr('ISLR') # data(Auto) in library(ISLR)
Auto = data(ISLR).fetch('Auto')['Auto']

#r_df = mtcars # success!!!
r_df = Auto # fail???

df = pandas2ri.ri2py(robjects.DataFrame(r_df))
df.info()

Then I can test data(mtcars) in library(datasets) successfully, while testing data(Auto) in library(ISLR) shows errors as

Parameter 'categories' must be list-like

How can I fix this issue?

Upvotes: 0

Views: 752

Answers (1)

sreedta
sreedta

Reputation: 133

What version of rpy2 are you using? I'm using rpy2-3.3.6 installed using pip in a Conda environment with R-4.0.3 (from conda-forge) along Python-3.6.11 (from conda-forge) and I'm able to read both the mtcars from datasetsas well as Auto from ISLR. Please check the results I get below

I think the error you are seeing might either be a bug or a side-effect of the configuration / dependencies. You should upgrade your rpy2 version to the more recent >= 3.3.0 and check the dependencies carefully.

Please check this post on how the functions have changed over time with rpy2 Pandas - how to convert r dataframe back to pandas?

Here is the entire sequence from my command line:

Python 3.6.11 | packaged by conda-forge | (default, Aug 5 2020, 20:09:42) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.

Importing relevant libraries

>>> import rpy2.robjects as ro
>>> import rpy2.robjects.packages as rpackages
>>> from rpy2.robjects.vectors import StrVector
>>> from rpy2.robjects.packages import importr, data

Importing packages and reading in the data

>>> datasets = importr('datasets')
>>> mtcars = data(datasets).fetch('mtcars')['mtcars']

>>> ISLR = importr('ISLR')
>>> Auto = data(ISLR).fetch('Auto')['Auto']

>>> r_df_mtcars = mtcars (using labels to clarify origin of data)
>>> r_df_Auto = Auto

Converting R Data frames into Pandas Data frames
*Note* the function **conversion.rpy2py** New from rpy2 version 3.3.0

>>> pd_df_mtcars = ro.conversion.rpy2py(r_df_mtcars)
>>> pd_df_Auto = ro.conversion.rpy2py(r_df_Auto)

Examine the data using the Pandas head() for both

>>> pd_df_mtcars.head()
                    mpg  cyl   disp     hp  drat     wt   qsec   vs   am  gear  carb
Mazda RX4          21.0  6.0  160.0  110.0  3.90  2.620  16.46  0.0  1.0   4.0   4.0
Mazda RX4 Wag      21.0  6.0  160.0  110.0  3.90  2.875  17.02  0.0  1.0   4.0   4.0
Datsun 710         22.8  4.0  108.0   93.0  3.85  2.320  18.61  1.0  1.0   4.0   1.0
Hornet 4 Drive     21.4  6.0  258.0  110.0  3.08  3.215  19.44  1.0  0.0   3.0   1.0
Hornet Sportabout  18.7  8.0  360.0  175.0  3.15  3.440  17.02  0.0  0.0   3.0   2.0
>>> pd_df_Auto.head()
    mpg  cylinders  displacement  horsepower  weight  acceleration  year  origin                       name
1  18.0        8.0         307.0       130.0  3504.0          12.0  70.0     1.0  chevrolet chevelle malibu
2  15.0        8.0         350.0       165.0  3693.0          11.5  70.0     1.0          buick skylark 320
3  18.0        8.0         318.0       150.0  3436.0          11.0  70.0     1.0         plymouth satellite
4  16.0        8.0         304.0       150.0  3433.0          12.0  70.0     1.0              amc rebel sst
5  17.0        8.0         302.0       140.0  3449.0          10.5  70.0     1.0                ford torino

To convert Pandas df to R df you can use:

>>> r_mtcars_df = ro.conversion.py2rpy(pd_df_mtcars)
>>> r_Auto_df = ro.conversion.py2rpy(pd_df_mtcars)

Upvotes: 2

Related Questions