Howardyan
Howardyan

Reputation: 657

how to use pyspark to read orc file

there are two types compress file format for spark. one is parquet, it's very easy to read:

from pyspark.sql import HiveContext
hiveCtx = HiveContext(sc)
hiveCtx.parquetFile(parquetFile)

but for ocr file. I cannot find a good example to show me how to use pyspark to read.

Upvotes: 4

Views: 20454

Answers (1)

Thiago Baldim
Thiago Baldim

Reputation: 7732

Well, there is two ways:

Spark 2.x:

orc_df = spark.read.orc('python/test_support/sql/orc_partitioned')

Spark 1.6:

df = hiveContext.read.orc('python/test_support/sql/orc_partitioned')

Upvotes: 6

Related Questions