Reputation: 529
I'm Using Java-Spark.
I'm trying to write to Hive table by date partition column, What I'm trying is:
Dataset<Row> ds = dataframe.select(cols).withColumn("load_date", function.lit("08.07.2018").cast("date"));
ds.write().mode(mode).partitionBy("load_date").save(hdfsDirectory);
After I'm running the lines below I see in hdfs the following directory:
/load_date=__HIVE_DEFAULT_PARTITION__
That meaning on null value.
So how can I write partition by date?
Thanks.
Upvotes: 0
Views: 851
Reputation: 6994
The easier way is to use the following function
from_unixtime(unix_timestamp('2016/06/01','yyyy/MM/dd'),'yyyyMMdd')
I prefer to use spark sql to achieve this
ds.createOrUpdateTempTable("tempTable")
val dsWithLoadDate = spark.sql("select *, from_unixtime(unix_timestamp('2016/06/01','yyyy/MM/dd'),'yyyyMMdd') as load_date from tempTable")
dsWithLoadDate.write().mode(mode).partitionBy("load_date").save(hdfsDirectory);
Upvotes: 1
Reputation: 11
To use cast
date has to be in a standard form (year-month-day)
Dataset<Row> ds = dataframe.select(cols).withColumn("load_date", function.lit("2018-07-08").cast("date"));
Otherwise use o.a.s.sql.functions.to_date
function and provide format compatible with SimpleDateFormat
Upvotes: 1