mytabi
mytabi

Reputation: 779

pyspark Datetype() during creation of the dataframe

here is my source code in a databricks notebook using python

data = [('2021-01-01','2021-01-02')]   
schema1 = StructType([
StructField("date1", DateType(), True),
StructField("date2", DateType(), True)])
spark.createDataFrame(data,schema1).show()

however I got the following error enter image description here

anyone has the idea ?

Upvotes: 3

Views: 5187

Answers (1)

Pav3k
Pav3k

Reputation: 909

You tried to inject string type data into date type so you failed.

I see two solutions:

  1. Use date type data
import datetime

data = [(
    datetime.datetime.strptime('2021-01-01', "%Y-%m-%d").date(),
    datetime.datetime.strptime('2021-01-02', "%Y-%m-%d").date()
)]   

schema1 = StructType([
StructField("date1", DateType(), True),
StructField("date2", DateType(), True)])

df = spark.createDataFrame(data, schema1)

df.show()

# output:
+----------+----------+
|     date1|     date2|
+----------+----------+
|2021-01-01|2021-01-02|
+----------+----------+
  1. Don't use schema at first, convert into date type later
from pyspark.sql import functions as F

data = [('2021-01-01','2021-01-02')] 
df = spark.createDataFrame(data)
df = df.select(*(F.to_date(c) for c in df.columns))

df.show()

# oudput
+-----------+-----------+
|to_date(_1)|to_date(_2)|
+-----------+-----------+
| 2021-01-01| 2021-01-02|
+-----------+-----------+

Upvotes: 3

Related Questions