Artem
Artem

Reputation: 477

Text file sent to Spark worker looks empty or not found

I want to send a basic config file to every Spark worker. Config file is written for Python's configobj. I specify it while submitting job.

$ ./bin/spark-submit --files .../config.cfg .../spark_str_hello.py

But when I try to read it, turns out that it doesn't exist there. When I try print config.sections (which should return a list), empty list is printed. Below is basic example for wordcount. I also tried to initialize config on workers with foreachRDD, had the same result. Is there any special way to send text files to Spark workers?

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from configobj import ConfigObj

config = ConfigObj('config.cfg')


sc = SparkContext()
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream('localhost', 9999)
words = lines.flatMap(lambda x: x.split(' '))
pairs = lines.map(lambda x: (x, 1))
wordCount = pairs.reduceByKey(lambda x, y: x + y)
print config.sections

pairs.pprint()
ssc.start()
ssc.awaitTermination()

Upvotes: 0

Views: 513

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67135

You need to use SparkFiles.get("FILE") to access the files sent via --files

Upvotes: 1

Related Questions