Reading feather object in R is slow

Question

I am using the feather packages for data exchange between Python (collecting the data) and R (used for analysis), writing and reading the data in Python is extremely fast. However, reading the same feather object in R is VERY slow, on the order of minutes for about a 10MB feather object that has about 80K rows and 24 columns. Each time I am reading the feather object locally so it is not due to network latency.

The only thing that I think it could be is, some of the variables (5 to be exact) are int64 type in Python which get coerced to double types when R has import them. This causes R to give the coercing int64 to double warning during the reading of the feather object. Can anyone confirm this or is there another explanation?

EDIT: Coercing is not the problem, I re-saved the int64 columns in Python at int32 and the reading in R is still just as slow. Need help.

EDIT 2: Example Code As requested, here is the code I am running. Just reading the feather object from a sub-folder essentially:

library(feather)
test_feather = read_feather("C:/my_folder/subfolder/test.feather")

guy · Accepted Answer

The issue is due to the creation of the feather object in a Linux environment while the reading of the same object in R was happening in a windows system. I don't fully know the details but essentially each OS has a different specification when representing binary data on disk.

I don't remember reading this issue / warning in the documentation (though I suppose it is obvious and implicit), but perhaps a little reminder might save some future people from making the same mistake.

Reading feather object in R is slow

Answers (1)

Related Questions