Sean Nguyen
Sean Nguyen

Reputation: 13118

How to write parquet file in partition in java similar to pyspark?

I can write parquet file into partition in pyspark like this:

rdd.write
 .partitionBy("created_year", "created_month")
 .parquet("hdfs:///my_file")

The parquet file is auto partition into created_year, created_month. How to do the same in java? I don't see an option in ParquetWriter class. Is there another class that can do that?

Thanks,

Upvotes: 1

Views: 1427

Answers (1)

iurii_n
iurii_n

Reputation: 1370

You have to convert your RDD into DataFrame and then call write parquet function.

df = sql_context.createDataFrame(rdd)
df.write.parquet("hdfs:///my_file", partitionBy=["created_year", "created_month"])

Upvotes: 1

Related Questions