Naveen Srikanth
Naveen Srikanth

Reputation: 789

Pyspark Creating timestamp column

I am using spark 2.1.0. I am not able to create timestamp column in pyspark I am using below code snippet. Please help

df=df.withColumn('Age',lit(datetime.now()))

I am getting

assertion error:col should be Column

Please help

Upvotes: 24

Views: 78859

Answers (3)

Nikhil Gupta
Nikhil Gupta

Reputation: 386

Adding on to balalaika, if someone, like me just want to add the date, but not the time with it, then he can follow the below code

from pyspark.sql import functions as F
df.withColumn('Age', F.current_date())

Hope this helps

Upvotes: 4

balalaika
balalaika

Reputation: 934

I am not sure for 2.1.0, on 2.2.1 at least you can just:

from pyspark.sql import functions as F
df.withColumn('Age', F.current_timestamp())

Hope it helps!

Upvotes: 50

Ankush Singh
Ankush Singh

Reputation: 560

Assuming you have dataframe from your code snippet and you want same timestamp for all your rows.

Let me create some dummy dataframe.

>>> dict = [{'name': 'Alice', 'age': 1},{'name': 'Again', 'age': 2}]
>>> df = spark.createDataFrame(dict)

>>> import time
>>> import datetime
>>> timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
>>> type(timestamp)
<class 'str'>

>>> from pyspark.sql.functions import lit,unix_timestamp
>>> timestamp
'2017-08-02 16:16:14'
>>> new_df = df.withColumn('time',unix_timestamp(lit(timestamp),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))
>>> new_df.show(truncate = False)
+---+-----+---------------------+
|age|name |time                 |
+---+-----+---------------------+
|1  |Alice|2017-08-02 16:16:14.0|
|2  |Again|2017-08-02 16:16:14.0|
+---+-----+---------------------+

>>> new_df.printSchema()
root
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)
 |-- time: timestamp (nullable = true)

Upvotes: 19

Related Questions