Chetan Munigangappa
Chetan Munigangappa

Reputation: 105

pyspark flatmat error: TypeError: 'int' object is not iterable

This is the sample example code in my book:

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("spark://chetan-ThinkPad- 
E470:7077").setAppName("FlatMap")
sc = SparkContext(conf=conf)

numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.flatMap(lambda x: x + x).collect()
for values in actionRDD:
    print(values)

I am getting this error: TypeError: 'int' object is not iterable

    at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
    at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
    at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    ... 1 more

Upvotes: 2

Views: 3849

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

You cannot use flatMap on an Int object

flatMap can be used in collection objects such as Arrays or list.

You can use map function on the rdd type that you have RDD[Integer]

numbersRDD = sc.parallelize([1, 2, 3, 4])
actionRDD = numbersRDD.map(lambda x: x + x)

def printing(x):
    print x

actionRDD.foreach(printing)

which should print

2
4
6
8

Upvotes: 1

Related Questions