Stephane Maarek
Stephane Maarek

Reputation: 5352

Spark: Silently execute sc.wholeTextFiles

I am loading about 200k text files in Spark using input = sc.wholeTextFiles(hdfs://path/*) I then run a println(input.count) It turns out that my spark shell outputs a ton of text (which are the path of every file) and after a while it just hangs without returning my result.

I believe this may be due to the amount of text outputted by wholeTextFiles. Do you know of any way to run this command silently? or is there a better workaround?

Thanks!

Upvotes: 1

Views: 654

Answers (1)

Marko Bonaci
Marko Bonaci

Reputation: 5706

How large are your files? From the wholeTextFiles API:

Small files are preferred, large files are also allowable, but may cause bad performance.

In conf/log4j.properties, you can suppress excessive logging, like this:

# Set everything to be logged to the console
log4j.rootCategory=ERROR, console

That way, you'll get back only res to the repl, just like in the Scala (the language) repl.

Here are all other logging levels you can play with: log4j API.

Upvotes: 1

Related Questions