frlzjosh
frlzjosh

Reputation: 462

Unable to successfully extract timestamp to date and time in pyspark

I am having trouble extracting the time and date from a timestamp. The data is read as a string, so I saw that people are able to use date_format to essentially convert it into the format we want. Here is what I tried below

import pyspark.sql.functions as F
from pyspark.sql.functions import date_format

data = df_data.select(date_format(F.col("timestamp"),"MM/d/yy").alias("date"),
        date_format(F.col("timestamp"),"HH:mm").alias("time"),
        date_format(F.col("timestamp"), "M/d/yy").alias("current_date_formated")).show(10)

My input data looks like this:

  timestamp
11/9/18 14:11
11/9/18 14:27
11/9/18 14:42
11/9/18 14:57

The output of my code above is spewed as under my date and time alias returns rows of null. I have also tried using to_date but didn't work either

import pyspark.sql.functions as F

data = df_data.select(date_format(F.col("timestamp"),"MM/d/yy").alias("date"),
        F.to_date(F.col("timestamp"),"HH:mm").alias("time"),
        F.to_date(F.col("timestamp"), "M/d/yy").alias("current_date_formated")).show(10)

Upvotes: 1

Views: 572

Answers (1)

murtihash
murtihash

Reputation: 8410

You have the right idea with date_format. The only thing your missing is it to first and foremost convert your string timestamp to Timestamp Type. And then you can do your formatting. I assumed your input data is in the format of day-month-year, if not, you can switch it to "MM/dd/yy HH:mm".

from pyspark.sql import functions as F
df.withColumn("timestamp", F.to_timestamp("timestamp","dd/MM/yy HH:mm"))\
  .withColumn("time",F.date_format("timestamp","HH:mm"))\
  .withColumn("current_date_formated",F.date_format("timestamp", "M/d/yy")).show()

+-------------------+-----+---------------------+
|          timestamp| time|current_date_formated|
+-------------------+-----+---------------------+
|2018-09-11 14:11:00|14:11|              9/11/18|
|2018-09-11 14:27:00|14:27|              9/11/18|
|2018-09-11 14:42:00|14:42|              9/11/18|
|2018-09-11 14:57:00|14:57|              9/11/18|
+-------------------+-----+---------------------+

Upvotes: 1

Related Questions