Reputation: 789
I am using pyspark 2.1 . Below is my dataframe content
expecteddays,date
139,30.JUl.2017
134,01.NOV.2018
My Output should be as below
138,30.JUL.2017,<30/SEP/2018,4/FEB/2019>
Poupulation of last column is taken care my below modules dateRangeBetween
and get_date
Below is my code
from datetime import datetime
from datetime import timedelta
import pandas as pd
from datetime import timedelta
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql.functions import concat,explode
from datetime import datetime
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType
from datetime import timedelta
import pandas as pd
from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType
from pyspark.sql import types maintenance_final_join=spark.read.csv('/user/NaveenSri/adh_dev_engg/test.csv',header=True)
def get_date(dateFormat="%d-%m-%Y", addDays=0 ,timeNow=0 ):
#print('inside get date',timesNow)
if (addDays!=0):
anotherTime = timeNow + timedelta(days=addDays)
else:
anotherTime = timeNow
return anotherTime.strftime(dateFormat)
def dateRangebetween(expectedDate , estimatedDays):
output_format = '%d-%m-%Y'
dateRangeList =[]
j=2
#print('inside Date range',expectedDate)
rangeEnddate= datetime.strptime(get_date(output_format, 730,expectedDate), '%d-%m-%Y').date()
#print('rangeEnddate---',rangeEnddate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,expectedDate), '%d-%m-%Y').date()
#print('calculatedDate----',calculatedDate)
while(calculatedDate<=rangeEnddate):
# print(calculatedDate)
#print (estimatedDays)
dateRangeList.append(calculatedDate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date()
#print('-----', datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date())
return dateRangeList
dateRange = udf(dateRangebetween, types.ArrayType(types.StringType()))
addDays=182
result = maintenance_final_join.withColumn('Part_Dates',dateRange(maintenance_final_join.Expected,maintenance_final_join.estimateddays)).show()
After executing I am getting this error:
TypeError: coercing to Unicode: need string or buffer, datetime.timedelta found
Upvotes: 0
Views: 1261
Reputation: 12205
First of all, could you please fix your indent. Your dateRangebetween()
function is difficult to read as it is.
However, your problem is in this:
dateRangeList.append(calculatedDate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays,
calculatedDate), '%d-%m-%Y').date()
Your calculatedDate is a datetime object. Then you append this object (not the string representation) to dateRangeList and return this. Then in your main program, you try to do udf to an array of datetime objects.
I assume your intention was to use string representations. If you changed
dateRangeList.append(calculatedDate.strftime("......"))
and inserted the correct format string in place of dots, you would at least be processing string objects instead of datetimes.
Upvotes: 1