Reputation: 208
I have a table like below
| job_id| timestamp | avg_Tag_value |
|:---- |:------------------------------- | ------------- |
| j1 | 2023-03-19T01:52:00.000+0000 | 0.4 |
| j2 | 2023-03-19T01:53:00.000+0000 | 0.5 |
| j3 | 2023-03-19T01:54:00.000+0000 | 0.6 |
and I want to truncate the timestamp to hour using df = df.select(date_trunc("hour", "timestamp"))
and I'm getting an error like "AnalysisException: cannot resolve 'timestamp
' given input columns: [date_trunc(hour, timestamp)];"
timestamp column is timestamp type. not sure what's causing the error..
Anyone can help with this?
Thanks
Upvotes: 1
Views: 288
Reputation: 1739
Your code works for me (I'm guessing you haven't imported functions module pyspark.sql.functions
) -
from pyspark.sql.functions import *
# Create the input DataFrame
data = [("j1", "2023-03-19T01:52:00.000+0000", 0.4),
("j2", "2023-03-19T01:53:00.000+0000", 0.5),
("j3", "2023-03-19T01:54:00.000+0000", 0.6)]
schema = ["job_id", "timestamp", "avg_Tag_value"]
df = spark.createDataFrame(data, schema)
df = df.select(date_trunc("hour", "timestamp"))
df.show(truncate=False)
+---------------------------+
|date_trunc(hour, timestamp)|
+---------------------------+
| 2023-03-19 01:00:00|
| 2023-03-19 01:00:00|
| 2023-03-19 01:00:00|
+---------------------------+
Upvotes: 0
Reputation: 2764
I reproduce same thing in my environment. I got this similar error.
To resolve this issue. Please follow this code:
from pyspark.sql import SparkSession
from pyspark.sql.functions import date_trunc, col
from pyspark.sql.types import StructType, StructField, StringType, TimestampType, DoubleType
sch1 = StructType([
StructField("job_id", StringType(), True),
StructField("timestamp", StringType(), True),
StructField("avg_Tag_value", DoubleType(), True)
])
d1 = [
("j1", "2023-03-19T01:52:00.000+0000", 0.4),
("j2", "2023-03-19T01:53:00.000+0000", 0.5),
("j3", "2023-03-19T01:54:00.000+0000", 0.6)
]
df = spark.createDataFrame(d1, sch1)
df = df.select(date_trunc("hour", col("timestamp")).alias("hour"), "avg_Tag_value")
df.show()
Output:
Upvotes: 0