MMV
MMV

Reputation: 208

AnalysisException: cannot resolve '`timestamp`' given input columns: [date_trunc(hour, timestamp)];

I have a table like below

| job_id| timestamp                      | avg_Tag_value |
|:----  |:------------------------------- | ------------- |
| j1    | 2023-03-19T01:52:00.000+0000    | 0.4           |
| j2    | 2023-03-19T01:53:00.000+0000    | 0.5           |
| j3    | 2023-03-19T01:54:00.000+0000    | 0.6           |

and I want to truncate the timestamp to hour using df = df.select(date_trunc("hour", "timestamp"))

and I'm getting an error like "AnalysisException: cannot resolve 'timestamp' given input columns: [date_trunc(hour, timestamp)];"

timestamp column is timestamp type. not sure what's causing the error..

Anyone can help with this?

Thanks

Upvotes: 1

Views: 288

Answers (2)

Dipanjan Mallick
Dipanjan Mallick

Reputation: 1739

Your code works for me (I'm guessing you haven't imported functions module pyspark.sql.functions) -

from pyspark.sql.functions import *

# Create the input DataFrame
data = [("j1", "2023-03-19T01:52:00.000+0000", 0.4),
        ("j2", "2023-03-19T01:53:00.000+0000", 0.5),
        ("j3", "2023-03-19T01:54:00.000+0000", 0.6)]

schema = ["job_id", "timestamp", "avg_Tag_value"]
df = spark.createDataFrame(data, schema)

df = df.select(date_trunc("hour", "timestamp"))

df.show(truncate=False)

+---------------------------+
|date_trunc(hour, timestamp)|
+---------------------------+
|        2023-03-19 01:00:00|
|        2023-03-19 01:00:00|
|        2023-03-19 01:00:00|
+---------------------------+

Image for details

Upvotes: 0

Vamsi Bitra
Vamsi Bitra

Reputation: 2764

I reproduce same thing in my environment. I got this similar error.

enter image description here

To resolve this issue. Please follow this code:

from pyspark.sql import SparkSession
from pyspark.sql.functions import date_trunc, col
from pyspark.sql.types import StructType, StructField, StringType, TimestampType, DoubleType

sch1 = StructType([
    StructField("job_id", StringType(), True),
    StructField("timestamp", StringType(), True),
    StructField("avg_Tag_value", DoubleType(), True)
])
d1 = [
    ("j1", "2023-03-19T01:52:00.000+0000", 0.4),
    ("j2", "2023-03-19T01:53:00.000+0000", 0.5),
    ("j3", "2023-03-19T01:54:00.000+0000", 0.6)
]
df = spark.createDataFrame(d1, sch1)
df = df.select(date_trunc("hour", col("timestamp")).alias("hour"), "avg_Tag_value")

df.show()
 

Output:

enter image description here enter image description here

Upvotes: 0

Related Questions