Bach-Nga
Bach-Nga

Reputation: 45

How to convert from decimal to date in scala select?

I have a column datetime object declare as decimal (38,0) not timestamp or date and the data input is 'yyyMMdd'. How do I select data with that column convert as date format as 'yyyy-MM-dd' in spark sql (scala) within a day or two days old?

I have tried:

select count(*) from table_name where to_date('column_name','yyyy-MM-dd') = date_sub(current_date(),1));

this gives me 0 count when a data have quiet more than 500000 records

I tried:

select count(*) from table_name where from_unixtime(cast(load_dt_id as string), 'yyyy-MM-dd') = date_sub(current_date(), 1));

I got data in year 1970-01-31 which those year data are not in the table, even when I select that column where it's like '1970%', I got "OK" with bulk sign that accelerate query with Delta. The data select in order of that column started with 20140320

Upvotes: 2

Views: 1108

Answers (1)

Andrew
Andrew

Reputation: 8758

The format argument for to_date is the format of the input, not the desired output. Assuming you have yyyymmdd:

Seq(("20200208")).toDF("RawDate").select(col("RawDate"),to_date(col("RawDate"),"yyyyMMdd").as("formatted_date")).show()
+--------+--------------+
| RawDate|formatted_date|
+--------+--------------+
|20200208|    2020-02-08|

+--------+--------------+

Expanding this to filter by the derived date column:

val raw = Seq(("20200208"),("20200209"),("20200210")).toDF("RawDate")
raw: org.apache.spark.sql.DataFrame = [RawDate: string]

raw.select(col("RawDate"),to_date(col("RawDate"),"yyyyMMdd").as("formatted_date")).filter($"formatted_date".geq(date_add(current_date,-1))).show
+--------+--------------+
| RawDate|formatted_date|
+--------+--------------+
|20200209|    2020-02-09|
|20200210|    2020-02-10|
+--------+--------------+

Upvotes: 2

Related Questions