Reputation: 185
I created spark dataframe using mongodata (in databricks using Python notebook)
I need to convert this dataframe as
How can I do this?
Upvotes: 2
Views: 1762
Reputation: 1486
Here is one proposed solution. You can organize your sal field into arrays using $concatArrays
in MongoDB before exporting it to Spark. Then, run something like this
#df
#+---+-----+------------------+
#| id|empno| sal|
#+---+-----+------------------+
#| 1| 101|[1000, 2000, 1500]|
#| 2| 102| [1000, 1500]|
#| 3| 103| [2000, 3000]|
#+---+-----+------------------+
import pyspark.sql.functions as F
df_new = df.select('id','empno',F.explode('sal').alias('sal'))
#df_new.show()
#+---+-----+----+
#| id|empno| sal|
#+---+-----+----+
#| 1| 101|1000|
#| 1| 101|2000|
#| 1| 101|1500|
#| 2| 102|1000|
#| 2| 102|1500|
#| 3| 103|2000|
#| 3| 103|3000|
#+---+-----+----+
Upvotes: 3