Naveen Srikanth
Naveen Srikanth

Reputation: 789

Pyspark Type Conversion Issue from Date to String

I am using pyspark 2.1 . Below is my dataframe content

expecteddays,date

139,30.JUl.2017

134,01.NOV.2018

My Output should be as below

138,30.JUL.2017,<30/SEP/2018,4/FEB/2019>

Poupulation of last column is taken care my below modules dateRangeBetween and get_date

Below is my code

from datetime import datetime 
from datetime import timedelta
import pandas as pd
from datetime import timedelta
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql.functions import concat,explode
from datetime import datetime
from pyspark.sql.functions import udf
from pyspark.sql.types import  StringType
from datetime import timedelta
import pandas as pd
from pyspark.sql.types import ArrayType, StructType, StructField, IntegerType
from pyspark.sql import types   maintenance_final_join=spark.read.csv('/user/NaveenSri/adh_dev_engg/test.csv',header=True)

def get_date(dateFormat="%d-%m-%Y", addDays=0 ,timeNow=0 ): 
    #print('inside get date',timesNow)
    if (addDays!=0):
        anotherTime = timeNow + timedelta(days=addDays)
    else:
        anotherTime = timeNow
    return anotherTime.strftime(dateFormat)
def dateRangebetween(expectedDate , estimatedDays):
output_format = '%d-%m-%Y'



dateRangeList =[]
j=2
#print('inside Date range',expectedDate)
rangeEnddate= datetime.strptime(get_date(output_format, 730,expectedDate), '%d-%m-%Y').date()
#print('rangeEnddate---',rangeEnddate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,expectedDate), '%d-%m-%Y').date()
#print('calculatedDate----',calculatedDate)

while(calculatedDate<=rangeEnddate):    
   # print(calculatedDate)
    #print (estimatedDays)  
    dateRangeList.append(calculatedDate)
    calculatedDate = datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date()

#print('-----', datetime.strptime(get_date(output_format,estimatedDays ,calculatedDate), '%d-%m-%Y').date())  
return dateRangeList

dateRange = udf(dateRangebetween, types.ArrayType(types.StringType()))
addDays=182
result = maintenance_final_join.withColumn('Part_Dates',dateRange(maintenance_final_join.Expected,maintenance_final_join.estimateddays)).show()

After executing I am getting this error:

TypeError: coercing to Unicode: need string or buffer, datetime.timedelta found

Upvotes: 0

Views: 1261

Answers (1)

Hannu
Hannu

Reputation: 12205

First of all, could you please fix your indent. Your dateRangebetween() function is difficult to read as it is.

However, your problem is in this:

dateRangeList.append(calculatedDate)
calculatedDate = datetime.strptime(get_date(output_format,estimatedDays, 
        calculatedDate), '%d-%m-%Y').date()

Your calculatedDate is a datetime object. Then you append this object (not the string representation) to dateRangeList and return this. Then in your main program, you try to do udf to an array of datetime objects.

I assume your intention was to use string representations. If you changed

dateRangeList.append(calculatedDate.strftime("......"))

and inserted the correct format string in place of dots, you would at least be processing string objects instead of datetimes.

Upvotes: 1

Related Questions