Reputation: 1131
I am using the feather
packages for data exchange between Python (collecting the data) and R (used for analysis), writing and reading the data in Python is extremely fast. However, reading the same feather object in R is VERY slow, on the order of minutes for about a 10MB feather object that has about 80K rows and 24 columns. Each time I am reading the feather object locally so it is not due to network latency.
The only thing that I think it could be is, some of the variables (5 to be exact) are int64
type in Python which get coerced to double
types when R has import them. This causes R to give the coercing int64 to double
warning during the reading of the feather object. Can anyone confirm this or is there another explanation?
EDIT: Coercing is not the problem, I re-saved the int64
columns in Python at int32
and the reading in R is still just as slow. Need help.
EDIT 2: Example Code As requested, here is the code I am running. Just reading the feather object from a sub-folder essentially:
library(feather)
test_feather = read_feather("C:/my_folder/subfolder/test.feather")
Upvotes: 2
Views: 727
Reputation: 1131
The issue is due to the creation of the feather
object in a Linux environment while the reading of the same object in R was happening in a windows system. I don't fully know the details but essentially each OS has a different specification when representing binary data on disk.
I don't remember reading this issue / warning in the documentation (though I suppose it is obvious and implicit), but perhaps a little reminder might save some future people from making the same mistake.
Upvotes: 3