Reputation: 313
I'm trying to infer schema from json-file (Spark 2.4.5)
{
"timestampField":"08.06.2020 12:03:50"
}
StructType mySchema = spark.read()
.option("multiline", true)
.option("inferSchema", true)
.option("timestampFormat","MM.dd.yyyy HH:mm:ss")
.json("cdr_json_schema.json")
.schema();
root
|-- timestampField: string (nullable = true)
I have try with default format file and read without option()
{
"timestampField":"2020-08-06T12:03:50.412+03:00"
}
It still string.
Upvotes: 0
Views: 520
Reputation: 313
Alternatively, we can use to_timestamp() on query like this:
df.select(to_timestamp(df.col("myStringTimestamp"),"MM.dd.yyyy HH:mm:ss").as("convertedTimestamp"));
Upvotes: 0
Reputation: 6323
timestampFormat
will be used for Timestamp columns. To recognize timestamp columns from json input, you need to specify schema as below-
val data =
"""
|{
|"timestampField":"08.06.2020 12:03:50"
|}
""".stripMargin
val df = spark.read.option("multiLine", true).json(Seq(data).toDS())
df.show(false)
df.printSchema()
/**
* +-------------------+
* |timestampField |
* +-------------------+
* |08.06.2020 12:03:50|
* +-------------------+
*
* root
* |-- timestampField: string (nullable = true)
*/
val df1 = spark.read
.schema(StructType(StructField("timestampField", DataTypes.TimestampType) :: Nil))
.option("multiLine", true)
.option("timestampFormat", "MM.dd.yyyy HH:mm:ss")
.json(Seq(data) toDS())
df1.show(false)
df1.printSchema()
/**
* +-------------------+
* |timestampField |
* +-------------------+
* |2020-08-06 12:03:50|
* +-------------------+
*
* root
* |-- timestampField: timestamp (nullable = true)
*/
Upvotes: 1