a.moussa
a.moussa

Reputation: 3287

How to create TimestampType column in spark from string

I have some datas contained in an Array of String like below (just for exemple):

val myArray = Array("1499955986039", "1499955986051", "1499955986122")

I want to map my list to an array of Timestamp, in order to create an RDD (myRdd) then create a dataframe like this

val df = createdataframe(myRdd, StructType(StructField("myTymeStamp", TimestampType,true)

My question is not how to create the Rdd, but how to replace string by millisecond timestamp. Do you have any idea? Thanks

Upvotes: 6

Views: 26826

Answers (2)

koiralo
koiralo

Reputation: 23109

You dont need to convert to timestamp before, You just convert to long and you can use schema to convert to tymestamp while creating dataframe as below

import org.apache.spark.sql.Row

val myArray = Array("1499955986039", "1499955986051", "1499955986122")

val myrdd = spark.sparkContext.parallelize(myArray.map(a => Row(a.toLong)))

val df = spark.createDataFrame(myrdd, StructType(Seq(StructField("myTymeStamp", TimestampType,true))))

Otherwise you can just create a dataframe from String and cast to timestamp later as below

val df = spark.createDataFrame(myrdd, StructType(Seq(StructField("myTymeStamp", StringType,true))))

//cast myTymeStamp from String to Long and to timestamp
df.withColumn("myTymeStamp", $"myTymeStamp".cast(LongType).cast(TimestampType))

Hope this helps!

Upvotes: 4

akuiper
akuiper

Reputation: 215057

Use java.sql.Timestamp:

val myArray = Array("1499955986039", "1499955986051", "1499955986122")
import java.sql.Timestamp    
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, TimestampType}

val rdd = sc.parallelize(myArray).map(s => Row(new Timestamp(s.toLong)))

val schema = StructType(Array(StructField("myTymeStamp", TimestampType, true)))

spark.createDataFrame(rdd, schema)
// res25: org.apache.spark.sql.DataFrame = [myTymeStamp: timestamp]

Upvotes: 11

Related Questions