Deepak_Spark_Beginner
Deepak_Spark_Beginner

Reputation: 273

Convert a column to timestamp in spark

I have a string column with a value like

"Sat Jan 23 19:23:32 +0000 2010".

How can I convert this into timestamp in Spark?

Below is the code

package sample.spark.com;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.functions;
import org.apache.spark.sql.functions.*;
import org.apache.spark.sql.expressions.*;
import org.apache.spark.sql.types.DataTypes;

    public class test2 {
    
        public static void main(String[] args) {
            // TODO Auto-generated method stub
            
            SparkSession session = SparkSession.builder().appName("ProspectPerson").master("local[*]").getOrCreate();
            
            session.sparkContext().setLogLevel("ERROR");
            
            List<String> test1 = new ArrayList<>();
            //test1.add("Sat Jan 23 19:23:32 +0000 2010");
            test1.add("Sat Jan 23 19:23:32 +0000 2010");
            //EEE MMM dd HH:mm:ss xxxx yyyy
            Dataset<Row> initialDF = session.createDataset(test1,Encoders.STRING()).toDF();
            initialDF = initialDF.withColumn("test", functions.to_timestamp(initialDF.col("value"),"EE MMM dd HH:mm:ss xxxx yyyy"));
            initialDF.show(false);      
            
        }
    
    }

If I exclude day of the week i.e. Sat then the pattern "MMM dd HH:mm:ss xxxx yyyy" works well with "Jan 23 19:23:32 +0000 2010". So the question is how to provide the datetime pattern?

Upvotes: 1

Views: 948

Answers (1)

mck
mck

Reputation: 42392

It seems the datetime parser cannot parse day-of-week in the input string. You can remove that part using substring (assuming that it always has 3 letters + 1 space):

initialDF = initialDF.withColumn(
    "test", 
    functions.expr("to_timestamp(substring(value, 5), 'MMM dd HH:mm:ss xxxx yyyy')")
);

Upvotes: 0

Related Questions