Prasad Joshi
Prasad Joshi

Reputation: 51

not able to create a field with DateType using PySpark

I am trying to create dataframe using sample record. One of the field is of DateType. I am getting error for value provided in DatType field. Please find below code Error is

TypeError: field date: DateType can not accept object '2019-12-01' in type <class 'str'>

I tried to convert stringType to DateType using to_date plus some other ways but not able to do so. Please advise

from pyspark.sql.functions import to_date,col,lit,expr
from pyspark.sql.types import StructType,StructField,IntegerType,DateType,StringType
from pyspark.sql import Row

MySchema = StructType([ StructField("CustomerID",IntegerType(),True),
    StructField("Quantity",IntegerType(),True),
    StructField("date",DateType(),True)
    ])


myRow=Row(10,100,"2019-12-01")
mydf=spark.createDataFrame([myRow],MySchema)
display(mydf)

Upvotes: 5

Views: 9973

Answers (2)

hipokito
hipokito

Reputation: 473

What works for me (I'm on Python 3.8.12 and Spark version 3.0.1):

from datetime import datetime
from pyspark.sql.types import DateType, StructType, StructField, 
IntegerType, Row
from pyspark.sql import SparkSession

MySchema = StructType([ StructField("CustomerID",IntegerType(),True),
StructField("Quantity",IntegerType(),True),
StructField("date",DateType(),True)
])

spark = SparkSession.builder.appName("local").master("local").getOrCreate()
myRow=Row(10,100,datetime(2019, 12, 1))
mydf=spark.createDataFrame([myRow],MySchema)
mydf.show(truncate=False) #I'm not on DataBricks, so I use mydf.show(truncate=False) instead of display

Upvotes: 1

Saurabh
Saurabh

Reputation: 943

You can use datetime class to convert string to date:

from datetime import datetime

myRow=Row(10,100,datetime.strptime('2019-12-01','%Y-%m-%d'))
mydf=spark.createDataFrame([myRow],MySchema)
mydf.show()

It should work.

Upvotes: 5

Related Questions