Reputation: 13118
I can write parquet file into partition in pyspark like this:
rdd.write
.partitionBy("created_year", "created_month")
.parquet("hdfs:///my_file")
The parquet file is auto partition into created_year, created_month. How to do the same in java? I don't see an option in ParquetWriter class. Is there another class that can do that?
Thanks,
Upvotes: 1
Views: 1427
Reputation: 1370
You have to convert your RDD into DataFrame and then call write parquet function.
df = sql_context.createDataFrame(rdd)
df.write.parquet("hdfs:///my_file", partitionBy=["created_year", "created_month"])
Upvotes: 1