Reputation: 660
I have a DataFrame including data like:
+----+-----+---+-----+
|Year|Month|Day|... |
+----+-----+---+-----+
|2012| 2| 20| |
|2011| 7| 6| |
|2015| 3| 15| |
and I would like to add a column with date
Upvotes: 6
Views: 9429
Reputation: 3008
You can just use the concat_ws
function to create a date in string data type and just cast that to date.
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
//Source Data
val df = Seq((2012,2,20),(2011,7,6),(2015,3,15)).toDF("Year","Month","Day")
//using concat_ws function to create Date column and cast that column data type to date
val df1 = df.withColumn("Date",concat_ws("-",$"Year",$"Month",$"Day"))
.withColumn("Date",$"Date".cast("Date"))
display(df1)
You can see the output as below :
Upvotes: 1
Reputation: 32720
For Spark 3+, you can use make_date
function:
df.withColumn("date", expr("make_date(Year, Month, Day)"))
Upvotes: 4
Reputation: 13808
Not so complex as Shaido, just
df.withColumn("date", F.to_date(F.concat_ws("-", "Year", "Month", "Day")) ).show()
Work on spark 2.4 .
Upvotes: 5
Reputation: 28422
Merge the columns together and then use unix_timestamp
and to_date
to get a timestamp column. For an input dataframe df
:
df.withColumn("merge", concat_ws("-", $"Year", $"Month", $"Day"))
.withColumn("date", to_date(unix_timestamp($"merge", "yyyy-MM-dd").cast("timestamp")))
.drop("merge")
Upvotes: 5