Reputation: 789
I am using pyspark 3.x version.
Below is my query . I have a pyspark column with values as below
col
7.6
5.40
Above is the data .7.6 indicates 7 hours 6 minutes and 5.40 indicates minutes.
Now I should have new columns converting hours.minutes to complete minutes
So 7.6 will be (7(60))+6min=426 minutes and (5(60))+40=340 minutes
Final output should be as below
col col1
7.6 426
5.40 340
Request some help here in pyspark
Upvotes: 0
Views: 665
Reputation: 5032
You can use split to create individual columns corresponding to your input value. I have broken them into hour
& minutes
based on the example you provided.
The General idea would work for multiple pattern(s) as long as you can break and sum them up to calculate the total_time
input_list = [
(1,"7.6")
,(2,"4.59")
,(4,"5.00")
,(5,"0.10")
,(6,"7.19")
,(7,"1.42")
,(8,"6.23")
]
sparkDF = sql.createDataFrame(input_list,['id','time_str'])
sparkDF = sparkDF.withColumn('hour',F.split(F.col('time_str'),'\.').getItem(0).cast(DoubleType()))\
.withColumn('minutes',F.split(F.col('time_str'),'\.').getItem(1).cast(DoubleType()))
sparkDF = sparkDF.withColumn('total_time_minutes',F.col('hour')*60+F.col('minutes'))
sparkDF.show()
+---+--------+----+-------+------------------+
| id|time_str|hour|minutes|total_time_minutes|
+---+--------+----+-------+------------------+
| 1| 7.6| 7.0| 6.0| 426.0|
| 2| 4.59| 4.0| 59.0| 299.0|
| 4| 5.00| 5.0| 0.0| 300.0|
| 5| 0.10| 0.0| 10.0| 10.0|
| 6| 7.19| 7.0| 19.0| 439.0|
| 7| 1.42| 1.0| 42.0| 102.0|
| 8| 6.23| 6.0| 23.0| 383.0|
+---+--------+----+-------+------------------+
Upvotes: 1