Reputation: 5352
I am loading about 200k text files in Spark using input = sc.wholeTextFiles(hdfs://path/*)
I then run a println(input.count)
It turns out that my spark shell outputs a ton of text (which are the path of every file) and after a while it just hangs without returning my result.
I believe this may be due to the amount of text outputted by wholeTextFiles
. Do you know of any way to run this command silently? or is there a better workaround?
Thanks!
Upvotes: 1
Views: 654
Reputation: 5706
How large are your files?
From the wholeTextFiles
API:
Small files are preferred, large files are also allowable, but may cause bad performance.
In conf/log4j.properties
, you can suppress excessive logging, like this:
# Set everything to be logged to the console
log4j.rootCategory=ERROR, console
That way, you'll get back only res
to the repl, just like in the Scala (the language) repl.
Here are all other logging levels you can play with: log4j API.
Upvotes: 1