pyspark sort array of it's array's value

Question

I have the following df:

+--------------------+
|  id|        id_info|
+--------------------+
|id_1| [[1, 8, 2, "bar"], [5, 9, 2, "foo"], [4, 3, 2, "something"], [9, null, 2, "this_is_null"]] |

I would like this sorted by the second element in descending order, so:

+--------------------+
|  id|        id_info|
+--------------------+
|id_1| [[5, 9, 2, "foo"], [1, 8, 2, "bar"], [4, 3, 2, "something"], [9, null, 2, "this_is_null"]] |

I came up with something like this :

def def_sort(x):
        return sorted(x, key=lambda x:x[1], reverse=True)

udf_sort = F.udf(def_sort, T.ArrayType(T.ArrayType(T.IntegerType())))
df.select("id", udf_sort("id_info"))

I'm not sure how to handle null values like this, also is there maybe a built-in function for this? Can I somehow do it with F.array_sort?

werner · Accepted Answer

The elements of the array contain integers and a string, so I assume that the column id_info is an array of structs.

So the schema of the input data would be similiar to

root
 |-- id: string (nullable = true)
 |-- id_info: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- col1: integer (nullable = true)
 |    |    |-- col2: integer (nullable = true)
 |    |    |-- col3: integer (nullable = true)
 |    |    |-- col4: string (nullable = true)

The names of the elements of the struct might be different.

With this schema information we can use array_sort to order the array:

df.selectExpr("array_sort(id_info, (l,r) -> \
    case when l['col2'] > r['col2'] then -1 else 1 end) as sorted") \
    .show(truncate=False)

prints

+----------------------------------------------------------------------------------+
|sorted                                                                            |
+----------------------------------------------------------------------------------+
|[{5, 9, 2, foo}, {1, 8, 2, bar}, {4, 3, 2, something}, {9, null, 2, this_is_null}]|
+----------------------------------------------------------------------------------+

pyspark sort array of it's array's value

Answers (2)

Related Questions

pyspark sort array of it&#39;s array&#39;s value

Answers (2)

Related Questions

pyspark sort array of it's array's value