Accessing to elements of an array in Row object format and concatenate them- pySpark

Question

I have a pyspark.sql.dataframe.DataFrame, where one of the columns has an array of Row objects:

    +------------------------------------------------------------------------------------------------+
    |column                                                                          |
    +------------------------------------------------------------------------------------------------+
    |[Row(arrival='2019-12-25 19:55', departure='2019-12-25 18:22'),                                 |
    |  Row(arrival='2019-12-26 14:56', departure='2019-12-26 08:52')]                                |
    +------------------------------------------------------------------------------------------------+

Not all the rows in the column have the same quantity of elements (in this case, we have 2 but we could have more).

What I am trying to do is to generate a concatenation of the hours of each date, to have something like this:

18:22_19:55_08:52_14:56

This means, the departure time of the first element, concatenated with the arrival time of the first element, concatenated again with the departure time of the second element and once again with the arrival time of the second element.

Is there a simple way to do so using pyspark?

Accessing to elements of an array in Row object format and concatenate them- pySpark

Answers (1)

Related Questions