Brian Clements
Brian Clements

Reputation: 29

Why is Apache Spark map() giving me a "not iterable" error?

Why is the following code, copied directly from Spark the Definitive Guide, returning an error?

df.select(map(col("Description"), col("InvoiceNo")).alias("complex_map"))\
  .selectExpr("complex_map['WHITE METAL LANTERN']").show(2)

Returns the following error:

TypeError: Column is not iterable

I'm assuming newer releases of Spark have changed the behavior of this code, but I'm having a hard time figuring out how to adjust for it to run.

Upvotes: 0

Views: 169

Answers (1)

blackbishop
blackbishop

Reputation: 32660

You are calling the python map function which is expecting second parameter to be iterable thus you get that error message.

What you're looking for is pyspark create_map function (which is the same as map in spark Scala API). Try this instead:

from pyspark.sql import functions as F

df.select(
    F.create_map(F.col("Description"), F.col("InvoiceNo")).alias("complex_map")
).select(
    F.col("complex_map")["WHITE METAL LANTERN"]
).show(2)

Upvotes: 2

Related Questions