veganzombie
veganzombie

Reputation: 129

Nested Row Logic - Pyspark Dataframe

I have a dataframe like this. I want to get "mert" ones in Names column and the Index of the Status column to the Name column is Active. I transform this from json but I can't filter. how can I do it?

+---+-----------------------------------------+------------------+
|ID |Names                                    | Status           |
+---+-----------------------------------------+------------------+
|1  |[[[aaaa, mert], [cccc, Doe]]]            | [Active, Active] |
|2  |[[[aaa, Michael], [ggg, ]]]              | [Active, Active] |
|3  |[[[cccc, mert], [gg, Merk  ]]]           | [Suspend, Active]|
|3  |[[[dddd, Angela], [fggg, Merl]]]         | [Active, Suspend]|
+---+-----------------------------------------+------------------+

Upvotes: 0

Views: 143

Answers (1)

Azhar Khan
Azhar Khan

Reputation: 4108

It is not clear if your data type is arrays or string. From the problem context it looks like array.

If it is array, then:

  • remove outer layers of arrays with explode() (twice)
  • zip "Names" & "Status" using arrays_zip() (so it can be refered by same index)
  • filter record if array contains required value
df = spark.createDataFrame(data=[[1,[[["aaaa","mert"],["cccc","Doe"]]],["Active","Active"]],[2,[[["aaa","Michael"],["ggg",""]]],["Active","Active"]],[3,[[["cccc","mert"],["gg","Merk  "]]],["Suspend","Active"]],[4,[[["dddd","Angela"],["fggg","Merl"]]],["Active","Suspend"]]], schema=["ID","Names","Status"])

df = df.withColumn("Names2", F.explode("Names")) \
       .withColumn("Names2", F.explode("Names2")) \
       .withColumn("Names_Status", F.arrays_zip("Names2", "Status")) \
       .filter((F.array_contains(F.col("Names_Status").getField("Names2"), "mert")) \
               & (F.array_contains(F.col("Names_Status").getField("Status"), "Active"))) \
       .drop("Names2", "Names_Status")

[Out]:
+---+------------------------------+-----------------+
|ID |Names                         |Status           |
+---+------------------------------+-----------------+
|1  |[[[aaaa, mert], [cccc, Doe]]] |[Active, Active] |
|3  |[[[cccc, mert], [gg, Merk  ]]]|[Suspend, Active]|
+---+------------------------------+-----------------+

Upvotes: 1

Related Questions