Mederr
Mederr

Reputation: 660

How to get date from different year, month and day columns in spark (scala)

I have a DataFrame including data like:

+----+-----+---+-----+
|Year|Month|Day|...  |
+----+-----+---+-----+
|2012|    2| 20|     |
|2011|    7|  6|     |
|2015|    3| 15|     |

and I would like to add a column with date

Upvotes: 6

Views: 9429

Answers (4)

Nikunj Kakadiya
Nikunj Kakadiya

Reputation: 3008

You can just use the concat_ws function to create a date in string data type and just cast that to date.

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
//Source Data
val df = Seq((2012,2,20),(2011,7,6),(2015,3,15)).toDF("Year","Month","Day")
//using concat_ws function to create Date column and cast that column data type to date
val df1 = df.withColumn("Date",concat_ws("-",$"Year",$"Month",$"Day"))
.withColumn("Date",$"Date".cast("Date"))
display(df1)

You can see the output as below :

enter image description here

Upvotes: 1

blackbishop
blackbishop

Reputation: 32720

For Spark 3+, you can use make_date function:

df.withColumn("date", expr("make_date(Year, Month, Day)"))

Upvotes: 4

Mithril
Mithril

Reputation: 13808

Not so complex as Shaido, just

df.withColumn("date", F.to_date(F.concat_ws("-", "Year", "Month", "Day")) ).show()

Work on spark 2.4 .

Upvotes: 5

Shaido
Shaido

Reputation: 28422

Merge the columns together and then use unix_timestamp and to_date to get a timestamp column. For an input dataframe df:

df.withColumn("merge", concat_ws("-", $"Year", $"Month", $"Day"))
  .withColumn("date", to_date(unix_timestamp($"merge", "yyyy-MM-dd").cast("timestamp")))
  .drop("merge")

Upvotes: 5

Related Questions