Reputation: 904
After reading in a .csv file using pandas, and then converting it into an R dataframe using the rpy2 package, I created a model using some R functions (also via rpy2), and now want to take the summary of the model and convert it into a Pandas dataframe (so that I can either save it as a .csv file or use it for other purposes).
I have followed out the instructions on the pandas site (source: https://pandas.pydata.org/pandas-docs/stable/r_interface.html) in order to figure it out:
import pandas as pd
from rpy2.robjects import r
import sys
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects import r, pandas2ri
pandas2ri.activate()
caret = rpackages.importr('caret')
broom= rpackages.importr('broom')
my_data= pd.read_csv("my_data.csv")
r_dataframe= pandas2ri.py2ri(my_data)
preprocessing= ["center", "scale"]
center_scale= StrVector(preprocessing)
#these are the columns in my data frame that will consist of my predictors in the model
predictors= ['predictor1','predictor2','predictor3']
predictors_vector= StrVector(predictors)
#this column from the dataframe consists of the outcome of the model
outcome= ['fluorescence']
outcome_vector= StrVector(outcome)
#this line extracts the columns of the predictors from the dataframe
columns_predictors= r_dataframe.rx(True, columns_vector)
#this line extracts the column of the outcome from the dataframe
column_response= r_dataframe.rx(True, column_response)
cvCtrl = caret.trainControl(method = "repeatedcv", number= 20, repeats = 100)
model_R= caret.train(columns_predictors, columns_response, method = "glmStepAIC", preProc = center_scale, trControl = cvCtrl)
summary_model= base.summary(model_R)
coefficients= stats.coef(summary_model)
pd_dataframe = pandas2ri.ri2py(coefficients)
pd_dataframe.to_csv("coefficents.csv")
Although this workflow is ostensibly correct, the output .csv file did not meet my needs, as the names of the columns and rows were removed. When I ran the command type(pd_dataframe)
, I find that it is a <type 'numpy.ndarray'>
. Although the information of the table is still present, the new formatting has removed the names of the columns and rows.
So I ran the command type(coefficients)
and found that it was a <class 'rpy2.robjects.vectors.Matrix'>
. Since this Matrix object still retained the names of my columns and rows, I tried to convert it into an R objects DataFrame, but my efforts proved to be futile. Furthermore, I don't know why the line pd_dataframe = pandas2ri.ri2py(coefficients)
did not yield a pandas DataFrame object, nor why it did not retain the names of my columns and rows.
Can anybody recommend an approach so I can get some kind of pandas DataFrame that retains the names of my columns and rows?
UPDATE
A new method was mentioned in the documents of a slightly older version of the package called pandas2ri.ri2py_dataframe
(source: https://rpy2.readthedocs.io/en/version_2.7.x/changes.html), and now I have a proper data frame instead of the numpy array. However, I still can't get the names of the rows and columns to be transferred properly. Any suggestions?
Upvotes: 0
Views: 3130
Reputation: 559
For future readers, there is an easier way now to convert rpy2 dataframes to pandas dataframes:
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
from rpy2.robjects.conversion import localconverter
r_dataframef = ro.DataFrame( ... ) # this is your r dataframe
with localconverter(ro.default_converter + pandas2ri.converter):
pandas_dataframe = ro.conversion.rpy2py(r_dataframe)
pandas_dataframe # this is your pandas dataframe now
Taken from the docs: https://rpy2.github.io/doc/v3.0.x/html/generated_rst/pandas.html#from-r-to-pandas
Upvotes: 1
Reputation: 11543
May be it should happen automatically during conversion, but in the meantime row and column names can easily be obtained from the R object and added to the pandas DataFrame
. For example the column names for the R matrix should be at: https://rpy2.github.io/doc/v2.9.x/html/vector.html#rpy2.robjects.vectors.Matrix.colnames
Upvotes: 3