jakrm
jakrm

Reputation: 183

Adding date column from a variable value to Spark Dataframe

I have a Spark Dataframe as below and I am trying to add a new date column from a variable but gives an error.

jsonDF.printSchema()

root
 |-- Data: struct (nullable = true)
 |    |-- Record: struct (nullable = true)
 |    |    |-- FName: string (nullable = true)
 |    |    |-- LName: long (nullable = true)
 |    |    |-- Address: struct (nullable = true)
 |    |    |    |-- Applicant: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- Id: long (nullable = true)
 |    |    |    |    |    |-- Type: string (nullable = true)
 |    |    |    |    |    |-- Option: long (nullable = true)
 |    |    |    |-- Location: string (nullable = true)
 |    |    |    |-- Town: long (nullable = true)
 |    |    |-- IsActive: boolean (nullable = true)
 |-- Id: string (nullable = true)

Tried both the ways -

var_date='2019-07-15'

jsonDF.withColumn('my_date',to_date(var_date,'yyyy-MM-dd'))

jsonDF.select(to_date(var_date,'yyyy-MM-dd')).alias('my_date')

But I get an error

An error occurred while calling o50.withColumn.
: org.apache.spark.sql.AnalysisException: cannot resolve '`2019-07-15`' given input columns: [Data, Id];;
'Project [Data#8, Id#9, to_date('2019-07-15, Some(yyyy-MM-dd)) AS my_date#213]
+- Relation[Data#8, Id#11] json

An error occurred while calling o50.select.
: org.apache.spark.sql.AnalysisException: cannot resolve '`2019-07-15`' given input columns: [Data, Id];;
'Project [to_date('2019-07-15, Some(yyyy-MM-dd)) AS to_date(`2019-07-15`, 'yyyy-MM-dd'#210]

Kindly help.

Upvotes: 1

Views: 15634

Answers (2)

Aleksey Ivashov
Aleksey Ivashov

Reputation: 1

from datetime import datetime
from pyspark.sql import functions as F

var_date='2019-07-15'

jsonDF.withColumn('my_date',F.lit(datetime.strptime(var_date, '%Y-%m-%d').date()))

Upvotes: 0

Steven
Steven

Reputation: 15258

According to official documentation, to_date take a column as parameter. Therefore, it is trying to get a column named 2019-07-15.

You have to convert your value to be a column first, then apply your function.

from pyspark.sql import functions as F

var_date='2019-07-15'
jsonDF.select(F.to_date(F.lit(var_date),'yyyy-MM-dd').alias('my_date'))

Another way to do it, use directly the python datetime:

import datetime
from pyspark.sql import functions as F

var_date=datetime.date(2019,7,15)
jsonDF.select(F.lit(var_date).alias('my_date'))

Upvotes: 3

Related Questions