JPS
JPS

Reputation: 15

TypeError: Column is not iterable - Using map() and explode() in pyspark

from pyspark.sql import Row
df = spark.sparkContext.parallelize([
 Row(name='Angel', age=5, height=None,weight=40,desc = "Where is Angel"),
 Row(name='Bobby', age=None, height=40,weight=50,desc = "This is Bobby")
]).toDF()

df.select(map(col("desc"), col("age")).alias("complex_map"))\
  .selectExpr("explode(complex_map)").show(2)

while running the above code geting an error : TypeError: Column is not iterable

Please let me know where I am going wrong.

Upvotes: 1

Views: 637

Answers (1)

mck
mck

Reputation: 42412

You need to use the create_map function, not the native Python map:

import pyspark.sql.functions as F

df.select(F.create_map(F.col("desc"), F.col("age")).alias("complex_map"))\
  .selectExpr("explode(complex_map)").show(2)

To simplify the code further,

df.select(
    F.explode(
        F.create_map(F.col("desc"), F.col("age"))
    ).alias("complex_map")
).show(2)

Upvotes: 3

Related Questions