Reputation: 208
I want to store a column value in a table as a variable. The table looks like this below.
+----------------------------+--------------------------------+------------------+
|timestamp |item_name |item_value |
+----------------------------+--------------------------------+------------------+
|2023-03-01T17:35:00.000+0000|[asset_min, hpc_max, off_median]|[0.32, 0.67, 0.32]|
|2023-03-01T17:30:00.000+0000|[asset_min, hpc_max, off_median]|[0.54, 0.54, 0.3] |
|2023-03-01T17:20:00.000+0000|[asset_min, hpc_max, off_median]|[0.34, 0.25, 0.67]|
+----------------------------+--------------------------------+------------------+
I want to truncate the timestamp to the hour 2023-03-01T17:00:00.000+0000 and store it as a variable.
the outcome I want is
print(timestamp_hour)
'2023-03-01T17:00:00.000+0000'
Upvotes: 0
Views: 279
Reputation: 3686
You can do so using the date_trunc function and then collecting the result. I've applied a distinct to reduce the data to 1 row before collecting (or multiple if you have multiple hours in the timestamp column).
After collecting we select the first column from the first row.
import pyspark.sql.functions as F
timestamp_hour = df.select(F.date_trunc('hour', 'timestamp')).distinct().collect()[0][0]
Upvotes: 0
Reputation: 123
table = """
+----------------------------+--------------------------------+------------------+
|timestamp |item_name |item_value |
+----------------------------+--------------------------------+------------------+
|2023-03-01T17:35:00.000+0000|[asset_min, hpc_max, off_median]|[0.32, 0.67, 0.32]|
|2023-03-01T17:30:00.000+0000|[asset_min, hpc_max, off_median]|[0.54, 0.54, 0.3] |
|2023-03-01T17:20:00.000+0000|[asset_min, hpc_max, off_median]|[0.34, 0.25, 0.67]|
+----------------------------+--------------------------------+------------------+
"""
def timestamp_hour(t_stamp: str) -> str:
return f'{t_stamp[:13]}:00:00.000+0000'
ltable = table.splitlines()
for line in ltable:
if '2023-' in line:
print(timestamp_hour(line[1:28]))
Output
2023-03-01T17:00:00.000+0000
2023-03-01T17:00:00.000+0000
2023-03-01T17:00:00.000+0000
Upvotes: 0
Reputation: 123
May be something like this:
ts = '2023-03-01T17:35:00.000+0000'
def timestamp_hour(t_stamp: str) -> str:
return f'{t_stamp[:13]}:00:00.000+0000'
print(timestamp_hour(ts))
Output:
2023-03-01T17:00:00.000+0000
Upvotes: -1