cannot send pyspark output to a file in the local file system

Question

I'm running a pyspark job on spark (single node, stand-alone) and trying to save the output in a text file in the local file system.

input = sc.textFile(inputfilepath)
words = input.flatMap(lambda x: x.split())
wordCount = words.countByValue()

wordCount.saveAsTextFile("file:///home/username/output.txt")

I get an error saying

AttributeError: 'collections.defaultdict' object has no attribute 'saveAsTextFile'

Basically whatever I add to 'wordCount' object, for example collect() or map() it returns the same error. The code works with no problem when output goes to the terminal (with a for loop) but I can't figure what is missing to send the output to a file.

Kyle Heuton · Accepted Answer

The countByValue() method that you're calling is returning a dictionary of word counts. This is just a standard python dictionary, and doesn't have any Spark methods available to it.

You can use your favorite method to save the dictionary locally.

cannot send pyspark output to a file in the local file system

Answers (1)

Related Questions