Reputation: 189
I'm currently working with functional MRI data in R but I need to import it to Python for some faster analysis. How can I do that in an efficient way?
I currently have in R a list of 198135 dataframes. All of them have 5 variables and 84 observations of connectivity between brain regions. I need to display the same 198135 dataframes in Python for running some specific analysis there (with the same structure than in R: one object that contains all dataframes separately).
Initially I tried exporting a .RDS file from R and then importing it to Python using "pyreadr", but I'm getting empty objects in every atempt with "pyreadr.read_r()" function.
My other method was to save every dataframe of the R list as a separate .csv file, and then importing them to Python. In that way I could get what I wanted (I tried it with 100 dataframes only for trying the code). The problem with this method is that is highly inefficient and slow.
I found several answers to similar problems, but most of them were to merge all dataframes and load it as a unique .csv into Python, which is not the solution I need.
Is there some more efficient way to do this process, without altering the data structure that I mentioned?
Thanks for your help!
# This is the code in R for an example
a <- as.data.frame(cbind(c(1:3), c(1:3), c(4:6), c(7:9)))
b <- as.data.frame(cbind(c(11:13), c(21:23), c(64:66), c(77:79)))
c <- as.data.frame(cbind(c(31:33), c(61:63), c(34:36), c(57:59)))
d <- as.data.frame(cbind(c(12:14), c(13:15), c(54:56), c(67:69)))
e <- as.data.frame(cbind(c(31:33), c(51:53), c(54:56), c(37:39)))
somelist_of_df <- list(a,b,c,d,e)
saveRDS(somelist_of_df, "somefile.rds")
## This is the function I used from pyreadr in Python
import pyreadr
results = pyreadr.read_r('/somepath/somefile.rds')
Upvotes: 4
Views: 3895
Reputation: 192
I cannot comment in the @crlagos0 answer because reputation. I Want to add a couple of things:
seq_along(list_of_things)
is enough, there is no need to do seq_along(lenght(1:list_of_things))
in R
. Also, I want to point out that the official package to read and write feather files in R
is called arrow
and you can find its documentation here. In python
is pyarrow.
Upvotes: 0
Reputation: 3407
Pyreadr cannot currently read R lists, therefore you need to save the dataframes individually, also you need to save to a RDA file so that you can host multiple dataframes in one file:
# first construct a list with the names of dataframes you want to save
# instead of the dataframes themselves
somelist_of_df <- list("a", "b", "c", "d", "e")
do.call("save", c(somelist_of_df, file="somefile.rda"))
or any other variant as described here.
Then you can read the file in python:
import pyreadr
results = pyreadr.read_r('/somepath/somefile.rda')
The advantage is that there will be only one file with all dataframes.
Upvotes: 0
Reputation: 51
Pandas also implements a direct way to read .feather file :
pd.read_feather()
Upvotes: 0
Reputation: 189
Well, thanks for the help in the other answers, but it's not exactly what I was looking for(I wanted to export just one file with the list of dataframes within it, and then loading one single file to Python, keeping the same structure). For using feather you have to decompose the list in all the dataframes within it, pretty much like saving separate .csv files, and then load each one of them into Python (or R). Anyway, it must be said that it's much faster than the method with .csv.
I leave the code that I used successfully in a separate answer, maybe it could be useful for other people since I used a simple loop for loading dataframes into Python as a list:
## Exporting a list of dataframes from R to .feather files
library(feather) #required package
a <- as.data.frame(cbind(c(1:3), c(1:3), c(4:6), c(7:9))) #Example DFs
b <- as.data.frame(cbind(c(11:13), c(21:23), c(64:66), c(77:79)))
c <- as.data.frame(cbind(c(31:33), c(61:63), c(34:36), c(57:59)))
d <- as.data.frame(cbind(c(12:14), c(13:15), c(54:56), c(67:69)))
e <- as.data.frame(cbind(c(31:33), c(51:53), c(54:56), c(37:39)))
somelist_of_df <- list(a,b,c,d,e)
## With sapply you loop over the list for creating the .feather files
sapply(seq_along(1:length(somelist_of_df)),
function(i) write_feather(somelist_of_df[[i]],
paste0("/your/directory/","DF",i,".feather")))
(Using just a MacBook Air, the code above took less than 5 seconds to run for a list of 198135 DFs)
## Importing .feather files into a list of DFs in Python
import os
import feather
os.chdir('/your/directory')
directory = '/your/directory'
py_list_of_DFs = []
for filename in os.listdir(directory):
DF = feather.read_dataframe(filename)
py_list_of_DFs.append(DF)
(This code did the work for me besides it was a bit slow, it took 12 minutes to do the task for the 198135 DFs)
I hope this could be useful for somebody.
Upvotes: 3