spark can't infer timestamp on java

Question

I'm trying to infer schema from json-file (Spark 2.4.5)

{
"timestampField":"08.06.2020 12:03:50"
}

        StructType mySchema = spark.read()
            .option("multiline", true)
            .option("inferSchema", true)
            .option("timestampFormat","MM.dd.yyyy HH:mm:ss")
            .json("cdr_json_schema.json")
            .schema();

   root
 |-- timestampField: string (nullable = true)

I have try with default format file and read without option()

{
"timestampField":"2020-08-06T12:03:50.412+03:00"
}

It still string.

Som · Accepted Answer

timestampFormat will be used for Timestamp columns. To recognize timestamp columns from json input, you need to specify schema as below-


    val data =
      """
        |{
        |"timestampField":"08.06.2020 12:03:50"
        |}
      """.stripMargin
    val df = spark.read.option("multiLine", true).json(Seq(data).toDS())
    df.show(false)
    df.printSchema()
    /**
      * +-------------------+
      * |timestampField     |
      * +-------------------+
      * |08.06.2020 12:03:50|
      * +-------------------+
      *
      * root
      * |-- timestampField: string (nullable = true)
      */

    val df1 = spark.read
        .schema(StructType(StructField("timestampField", DataTypes.TimestampType) :: Nil))
      .option("multiLine", true)
      .option("timestampFormat", "MM.dd.yyyy HH:mm:ss")
      .json(Seq(data) toDS())
    df1.show(false)
    df1.printSchema()

    /**
      * +-------------------+
      * |timestampField     |
      * +-------------------+
      * |2020-08-06 12:03:50|
      * +-------------------+
      *
      * root
      * |-- timestampField: timestamp (nullable = true)
      */

spark can't infer timestamp on java

Answers (2)

Related Questions

spark can&#39;t infer timestamp on java

Answers (2)

Related Questions

spark can't infer timestamp on java