Reputation: 645
I have a spark Dataframe as follows,
+-------+-------+-----+
| s1 | s2 |isVal|
+-------+-------+-----+
|a |aa | 1|
|b |bb | 0|
|c |cc | 1|
|d |dd | 0|
|e |ee | 1|
+-------+-------+-----+
I want to check the isVal value in each row and if that value is equal to 1, that row should be split in to two rows.For ex: considering the first two rows of the above dataframe, result should be as follows,
+-------+-------+
| s1 | isVal|
+-------+-------+
|a | 1|
|aa | 1|
|b | 0|
+-------+-------+
Please help to build the logic using python. I have tried to build the logic using flatmap, but it did not deliver the expected result.
Upvotes: 0
Views: 908
Reputation: 15317
Here is the solution.
>>> from pyspark.sql.functions import array, col, explode
>>>
>>> d = [{'s1': 'a', 's2': 'aa', 'isVal':1}, {'s1': 'b', 's2': 'bb', 'isVal':0}, {'s1': 'c', 's2': '
cc', 'isVal':1}, {'s1': 'd', 's2': 'dd', 'isVal':0}]
>>> df = spark.createDataFrame(d)
>>> df.show()
+-----+---+---+
|isVal| s1| s2|
+-----+---+---+
| 1| a| aa|
| 0| b| bb|
| 1| c| cc|
| 0| d| dd|
+-----+---+---+
>>> df1 = df.where(col("isVal")==1).withColumn("s3",array(col("s1"), col("s2"))).select(col("s3"), col("isVal")).withColumn("s1",explode(col("s3"))).drop(col("s3"))
>>> df0 = df.select(col("isVal"), col("s1")).where(col("isVal")==0)
>>> df2 = df1.union(df0)
>>> df2.show()
+-----+---+
|isVal| s1|
+-----+---+
| 1| a|
| 1| aa|
| 1| c|
| 1| cc|
| 0| b|
| 0| d|
+-----+---+
Upvotes: 1