Reputation: 125
I have a file with timestamp column. When I try to read the file with a schema designed by myself it is populating the datetime column with null.
Source file has data as below
created_date
31-AUG-2016 02:48:38
31-AUG-2016 10:37:59
31-AUG-2016 23:37:51
where I am using the below code snippet
from pyspark.sql.types import *
Raw_Schema = StructType([StructField("created_date",DateType(),True)])
DF = spark.read.csv("csv").option("header","true").schema(Raw_schema).load("\path")
DF.display()
created_date
null
null
null
in the above DF.display() is showing the result as null for all the inputs. However my expected output is as below:
Created_Date
31-08-2016
31-08-2016
31-08-2016
Upvotes: 0
Views: 463
Reputation: 42422
You need to provide the date format because the format in the csv file is non-standard.
df = (spark.read
.format("csv")
.option("header","true")
.option("dateFormat", "dd-MMM-yyyy HH:mm:ss")
.schema(Raw_schema)
.load("filepath")
)
df.show()
+------------+
|created_date|
+------------+
| 2016-08-31|
| 2016-08-31|
| 2016-08-31|
+------------+
Upvotes: 2