Naveen Srikanth
Naveen Srikanth

Reputation: 789

pyspark convert column hours.min to minutes

I am using pyspark 3.x version.

Below is my query . I have a pyspark column with values as below

 col

 7.6
 5.40

Above is the data .7.6 indicates 7 hours 6 minutes and 5.40 indicates minutes.

Now I should have new columns converting hours.minutes to complete minutes

So 7.6 will be (7(60))+6min=426 minutes and (5(60))+40=340 minutes

Final output should be as below

col col1

7.6   426
5.40  340

Request some help here in pyspark

Upvotes: 0

Views: 665

Answers (1)

Vaebhav
Vaebhav

Reputation: 5032

You can use split to create individual columns corresponding to your input value. I have broken them into hour & minutes based on the example you provided.

The General idea would work for multiple pattern(s) as long as you can break and sum them up to calculate the total_time

Example -

input_list = [
  (1,"7.6")
  ,(2,"4.59")
  ,(4,"5.00")
  ,(5,"0.10")
  ,(6,"7.19")
  ,(7,"1.42")
  ,(8,"6.23")
]

sparkDF = sql.createDataFrame(input_list,['id','time_str'])

sparkDF = sparkDF.withColumn('hour',F.split(F.col('time_str'),'\.').getItem(0).cast(DoubleType()))\
                .withColumn('minutes',F.split(F.col('time_str'),'\.').getItem(1).cast(DoubleType()))

sparkDF = sparkDF.withColumn('total_time_minutes',F.col('hour')*60+F.col('minutes'))

sparkDF.show()

+---+--------+----+-------+------------------+
| id|time_str|hour|minutes|total_time_minutes|
+---+--------+----+-------+------------------+
|  1|     7.6| 7.0|    6.0|             426.0|
|  2|    4.59| 4.0|   59.0|             299.0|
|  4|    5.00| 5.0|    0.0|             300.0|
|  5|    0.10| 0.0|   10.0|              10.0|
|  6|    7.19| 7.0|   19.0|             439.0|
|  7|    1.42| 1.0|   42.0|             102.0|
|  8|    6.23| 6.0|   23.0|             383.0|
+---+--------+----+-------+------------------+

Upvotes: 1

Related Questions