Reputation: 657
there are two types compress file format for spark. one is parquet, it's very easy to read:
from pyspark.sql import HiveContext
hiveCtx = HiveContext(sc)
hiveCtx.parquetFile(parquetFile)
but for ocr file. I cannot find a good example to show me how to use pyspark to read.
Upvotes: 4
Views: 20454
Reputation: 7732
Well, there is two ways:
Spark 2.x:
orc_df = spark.read.orc('python/test_support/sql/orc_partitioned')
Spark 1.6:
df = hiveContext.read.orc('python/test_support/sql/orc_partitioned')
Upvotes: 6