Dilma
Dilma

Reputation: 645

How to split Dataframe row into two rows for a given condition -python

I have a spark Dataframe as follows,

+-------+-------+-----+
|  s1   |   s2  |isVal|
+-------+-------+-----+
|a      |aa     |    1|
|b      |bb     |    0|
|c      |cc     |    1|
|d      |dd     |    0|
|e      |ee     |    1|
+-------+-------+-----+

I want to check the isVal value in each row and if that value is equal to 1, that row should be split in to two rows.For ex: considering the first two rows of the above dataframe, result should be as follows,

+-------+-------+
|  s1   |  isVal|
+-------+-------+
|a      |      1|
|aa     |      1|
|b      |      0|
+-------+-------+

Please help to build the logic using python. I have tried to build the logic using flatmap, but it did not deliver the expected result.

Upvotes: 0

Views: 908

Answers (1)

abaghel
abaghel

Reputation: 15317

Here is the solution.

>>> from pyspark.sql.functions import array, col, explode
>>>
>>> d = [{'s1': 'a', 's2': 'aa', 'isVal':1}, {'s1': 'b', 's2': 'bb', 'isVal':0}, {'s1': 'c', 's2': '
cc', 'isVal':1}, {'s1': 'd', 's2': 'dd', 'isVal':0}]
>>> df = spark.createDataFrame(d)
>>> df.show()
+-----+---+---+
|isVal| s1| s2|
+-----+---+---+
|    1|  a| aa|
|    0|  b| bb|
|    1|  c| cc|
|    0|  d| dd|
+-----+---+---+

>>> df1 = df.where(col("isVal")==1).withColumn("s3",array(col("s1"), col("s2"))).select(col("s3"), col("isVal")).withColumn("s1",explode(col("s3"))).drop(col("s3"))
>>> df0 = df.select(col("isVal"), col("s1")).where(col("isVal")==0)
>>> df2 = df1.union(df0)
>>> df2.show()
+-----+---+
|isVal| s1|
+-----+---+
|    1|  a|
|    1| aa|
|    1|  c|
|    1| cc|
|    0|  b|
|    0|  d|
+-----+---+

Upvotes: 1

Related Questions