Reputation: 129
I have a dataframe like this. I want to get "mert" ones in Names column and the Index of the Status column to the Name column is Active. I transform this from json but I can't filter. how can I do it?
+---+-----------------------------------------+------------------+
|ID |Names | Status |
+---+-----------------------------------------+------------------+
|1 |[[[aaaa, mert], [cccc, Doe]]] | [Active, Active] |
|2 |[[[aaa, Michael], [ggg, ]]] | [Active, Active] |
|3 |[[[cccc, mert], [gg, Merk ]]] | [Suspend, Active]|
|3 |[[[dddd, Angela], [fggg, Merl]]] | [Active, Suspend]|
+---+-----------------------------------------+------------------+
Upvotes: 0
Views: 143
Reputation: 4108
It is not clear if your data type is arrays or string. From the problem context it looks like array.
If it is array, then:
explode()
(twice)arrays_zip()
(so it can be refered by same index)df = spark.createDataFrame(data=[[1,[[["aaaa","mert"],["cccc","Doe"]]],["Active","Active"]],[2,[[["aaa","Michael"],["ggg",""]]],["Active","Active"]],[3,[[["cccc","mert"],["gg","Merk "]]],["Suspend","Active"]],[4,[[["dddd","Angela"],["fggg","Merl"]]],["Active","Suspend"]]], schema=["ID","Names","Status"])
df = df.withColumn("Names2", F.explode("Names")) \
.withColumn("Names2", F.explode("Names2")) \
.withColumn("Names_Status", F.arrays_zip("Names2", "Status")) \
.filter((F.array_contains(F.col("Names_Status").getField("Names2"), "mert")) \
& (F.array_contains(F.col("Names_Status").getField("Status"), "Active"))) \
.drop("Names2", "Names_Status")
[Out]:
+---+------------------------------+-----------------+
|ID |Names |Status |
+---+------------------------------+-----------------+
|1 |[[[aaaa, mert], [cccc, Doe]]] |[Active, Active] |
|3 |[[[cccc, mert], [gg, Merk ]]]|[Suspend, Active]|
+---+------------------------------+-----------------+
Upvotes: 1