Raju
Raju

Reputation: 185

Spark DF: Split array to multiple rows

I created spark dataframe using mongodata (in databricks using Python notebook)

Dataframe

I need to convert this dataframe as

required output

How can I do this?

Upvotes: 2

Views: 1762

Answers (1)

QuantStats
QuantStats

Reputation: 1486

Here is one proposed solution. You can organize your sal field into arrays using $concatArrays in MongoDB before exporting it to Spark. Then, run something like this

#df
#+---+-----+------------------+
#| id|empno|               sal|
#+---+-----+------------------+
#|  1|  101|[1000, 2000, 1500]|
#|  2|  102|      [1000, 1500]|
#|  3|  103|      [2000, 3000]|
#+---+-----+------------------+

import pyspark.sql.functions as F

df_new = df.select('id','empno',F.explode('sal').alias('sal'))

#df_new.show()
#+---+-----+----+
#| id|empno| sal|
#+---+-----+----+
#|  1|  101|1000|
#|  1|  101|2000|
#|  1|  101|1500|
#|  2|  102|1000|
#|  2|  102|1500|
#|  3|  103|2000|
#|  3|  103|3000|
#+---+-----+----+

Upvotes: 3

Related Questions