create array of columns in pyspark

Question

I have a dataframe with single row and multiple columns. I would like it to convert it into multiple rows. I had found a similar question here on the stackoverflow.

The question answers how it can be done in scala but I wanted to do this in pyspark. I tried to replicate the code in pyspark but I wasn't able to do that.

I am not able to convert the below code in scala to python:

import org.apache.spark.sql.Column
var ColumnsAndValues: Array[Column] = df.columns.flatMap { c => {Array(lit(c), col(c))}}
val df2 = df1.withColumn("myMap", map(ColumnsAndValues: _*))

blackbishop · Accepted Answer

In Pyspark you can use create_map function to create map column. And a list comprehension with itertools.chain to get the equivalent of scala flatMap :

import itertools
from pyspark.sql import functions as F

columns_and_values = itertools.chain(*[(F.lit(c), F.col(c)) for c in df1.columns])
df2 = df1.withColumn("myMap", F.create_map(*columns_and_values))

create array of columns in pyspark

Answers (1)

Related Questions