Write Partition with Date column Java-Spark

Question

I'm Using Java-Spark.

I'm trying to write to Hive table by date partition column, What I'm trying is:

Dataset ds = dataframe.select(cols).withColumn("load_date", function.lit("08.07.2018").cast("date"));
ds.write().mode(mode).partitionBy("load_date").save(hdfsDirectory);

After I'm running the lines below I see in hdfs the following directory:

/load_date=__HIVE_DEFAULT_PARTITION__

That meaning on null value.

So how can I write partition by date?

Thanks.

Avishek Bhattacharya · Accepted Answer

The easier way is to use the following function

from_unixtime(unix_timestamp('2016/06/01','yyyy/MM/dd'),'yyyyMMdd')

I prefer to use spark sql to achieve this

ds.createOrUpdateTempTable("tempTable")
val dsWithLoadDate = spark.sql("select *, from_unixtime(unix_timestamp('2016/06/01','yyyy/MM/dd'),'yyyyMMdd') as load_date from tempTable")

dsWithLoadDate.write().mode(mode).partitionBy("load_date").save(hdfsDirectory);

Write Partition with Date column Java-Spark

Answers (2)

Related Questions