RDD foreach method provides no results

Question

I am trying to understand how foreach method works. In my jupyter notebook, I tried :

def f(x): print(x)
a = sc.parallelize([1, 2, 3, 4, 5])
b = a.foreach(f)
print(type(b))

I can execute that without any problem, but I don't have any output except the print(type(b)) part. The foreach doesn't return anything, just a none type. I do not know what foreach is supposed to do, and how to use it. Can you explain me what it is used for ?

desertnaut · Accepted Answer

foreach is an action, and does not return anything; so, you cannot use it as you do, i.e. assigning it to another variable like b = a.foreach(f). From Learning Spark, p. 41-42:

Adapting the simple example from the docs, run in a PySpark terminal:

>>> def f(x): print(x)
>>> a = sc.parallelize([1, 2, 3, 4, 5])
>>> a.foreach(f)
5
4
3
1
2

(NOTE: not sure about Jupyter, but the above code will not produce any print results in a Databricks notebook.)

You may also find the answers in this thread helpful.

RDD foreach method provides no results

Answers (2)

Related Questions