Ya Ko
Ya Ko

Reputation: 529

Write Partition with Date column Java-Spark

I'm Using Java-Spark.

I'm trying to write to Hive table by date partition column, What I'm trying is:

Dataset<Row> ds = dataframe.select(cols).withColumn("load_date", function.lit("08.07.2018").cast("date"));
ds.write().mode(mode).partitionBy("load_date").save(hdfsDirectory);

After I'm running the lines below I see in hdfs the following directory:

/load_date=__HIVE_DEFAULT_PARTITION__

That meaning on null value.

So how can I write partition by date?

Thanks.

Upvotes: 0

Views: 851

Answers (2)

Avishek Bhattacharya
Avishek Bhattacharya

Reputation: 6994

The easier way is to use the following function

from_unixtime(unix_timestamp('2016/06/01','yyyy/MM/dd'),'yyyyMMdd')

I prefer to use spark sql to achieve this

ds.createOrUpdateTempTable("tempTable")
val dsWithLoadDate = spark.sql("select *, from_unixtime(unix_timestamp('2016/06/01','yyyy/MM/dd'),'yyyyMMdd') as load_date from tempTable")

dsWithLoadDate.write().mode(mode).partitionBy("load_date").save(hdfsDirectory);

Upvotes: 1

user10191052
user10191052

Reputation: 11

To use cast date has to be in a standard form (year-month-day)

Dataset<Row> ds = dataframe.select(cols).withColumn("load_date", function.lit("2018-07-08").cast("date"));

Otherwise use o.a.s.sql.functions.to_date function and provide format compatible with SimpleDateFormat

Upvotes: 1

Related Questions