michalrudko
michalrudko

Reputation: 1530

Is it possible to read ORC file to Spark Data Frame in sparklyr?

I know that sparklyr has the following read file methods:

What about reading orc files? Is it supported yet by this library?

I know I can use read.orc in SparkR or this solution, but I'd like to keep my code in sparklyr.

Upvotes: 3

Views: 1401

Answers (1)

zero323
zero323

Reputation: 330063

You can use low level Spark API in the same way I described in my answer to Transfer data from database to Spark using sparklyr:

library(dplyr)
library(sparklyr)

sc <- spark_connect(...)

spark_session(sc) %>% 
  invoke("read") %>% 
  invoke("format", "orc") %>%
  invoke("load", path) %>% 
  invoke("createOrReplaceTempView", name)

df <- tbl(sc, name)

where name is an arbitrary name used to identify the table

In the current sparklyr version you should be able to replace above with spark_read_source:

spark_read_source(sc, name, source = "orc", options = list(path = path))

Upvotes: 5

Related Questions