Reputation: 3587
Assuming a dataframe with a date column and an Int column representing a number of months:
val df = Seq(("2011-11-11",1),("2010-11-11",3),("2012-11-11",5))
.toDF("startDate","monthsToAdd")
.withColumn("startDate",'startDate.cast(DateType))
+----------+-----------+
| startDate|monthsToAdd|
+----------+-----------+
|2011-11-11| 1|
|2010-11-11| 3|
|2012-11-11| 5|
+----------+-----------+
is there a way of creating an endDate column by adding the months to startDate without converting the date column back to string?
So basically same as the add_months function
def add_months(startDate: Column, numMonths: Int)
but passing a column instead of a literal.
Upvotes: 0
Views: 670
Reputation: 6739
you can use UDF (User Defined Functions)
to achieve this. Below I have create myUDF
function which add the months to date and returns the result date in String format and I will use this UDF to create a new column by using withColumn
on DataFrame
import java.text.SimpleDateFormat
import java.util.Calendar
import javax.xml.bind.DatatypeConverter
import org.apache.spark.sql.functions._
import sparkSession.sqlContext.implicits._
val df = Seq(("2011-11-11",1),("2010-11-11",3),("2012-11-11",5)).toDF("startDate","monthsToAdd")
val myUDF = udf {
val simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd")
(startDate: String, monthValue: Int) => {
val calendar = DatatypeConverter.parseDateTime(startDate)
calendar.add(Calendar.MONTH, monthValue)
simpleDateFormat.format(calendar.getTime)
}
}
val newDf = df.withColumn("endDate", myUDF(df("startDate"), df("monthsToAdd")))
newDf.show()
Output:
+----------+-----------+----------+
| startDate|monthsToAdd| endDate|
+----------+-----------+----------+
|2011-11-11| 1|2011-12-11|
|2010-11-11| 3|2011-02-11|
|2012-11-11| 5|2013-04-11|
+----------+-----------+----------+
Upvotes: 1