Reputation: 789
I am using spark 2.1.0. I am not able to create timestamp column in pyspark I am using below code snippet. Please help
df=df.withColumn('Age',lit(datetime.now()))
I am getting
assertion error:col should be Column
Please help
Upvotes: 24
Views: 78859
Reputation: 386
Adding on to balalaika, if someone, like me just want to add the date, but not the time with it, then he can follow the below code
from pyspark.sql import functions as F
df.withColumn('Age', F.current_date())
Hope this helps
Upvotes: 4
Reputation: 934
I am not sure for 2.1.0, on 2.2.1 at least you can just:
from pyspark.sql import functions as F
df.withColumn('Age', F.current_timestamp())
Hope it helps!
Upvotes: 50
Reputation: 560
Assuming you have dataframe from your code snippet and you want same timestamp for all your rows.
Let me create some dummy dataframe.
>>> dict = [{'name': 'Alice', 'age': 1},{'name': 'Again', 'age': 2}]
>>> df = spark.createDataFrame(dict)
>>> import time
>>> import datetime
>>> timestamp = datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S')
>>> type(timestamp)
<class 'str'>
>>> from pyspark.sql.functions import lit,unix_timestamp
>>> timestamp
'2017-08-02 16:16:14'
>>> new_df = df.withColumn('time',unix_timestamp(lit(timestamp),'yyyy-MM-dd HH:mm:ss').cast("timestamp"))
>>> new_df.show(truncate = False)
+---+-----+---------------------+
|age|name |time |
+---+-----+---------------------+
|1 |Alice|2017-08-02 16:16:14.0|
|2 |Again|2017-08-02 16:16:14.0|
+---+-----+---------------------+
>>> new_df.printSchema()
root
|-- age: long (nullable = true)
|-- name: string (nullable = true)
|-- time: timestamp (nullable = true)
Upvotes: 19