Chique_Code
Chique_Code

Reputation: 1530

convert string type column to datetime in pySpark

I have a column Time in my spark df. It is a string type. I need to convert it to datetime format. I have tried the following:

data.select(unix_timestamp(data.Time, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()).alias("timestamp"))

data.printSchema()

The output is:

root
 |-- Time: string (nullable = true)

If I save it in a new df, then I am losing all of my other columns.

Upvotes: 1

Views: 2720

Answers (1)

Sanket9394
Sanket9394

Reputation: 2091

You can use withColumn instead of select

data = spark.createDataFrame([('1997/02/28 10:30:00',"test")], ['Time','Col_Test'])

df = data.withColumn("timestamp",unix_timestamp(data.Time, 'yyyy/MM/dd HH:mm:ss').cast(TimestampType()))

Output :

>>> df.show()
+-------------------+--------+-------------------+
|               Time|Col_Test|          timestamp|
+-------------------+--------+-------------------+
|1997/02/28 10:30:00|    test|1997-02-28 10:30:00|
+-------------------+--------+-------------------+

>>> data.printSchema()
root
 |-- Time: string (nullable = true)
 |-- Col_Test: string (nullable = true)

>>> df.printSchema()
root
 |-- Time: string (nullable = true)
 |-- Col_Test: string (nullable = true)
 |-- timestamp: timestamp (nullable = true)

Upvotes: 1

Related Questions