Reputation: 69
My Spark code reads like this:-
val logData = sc.textFile("hdfs://localhost:9000/home/akshat/recipes/recipes/simplyrecipes/*/*/*/*")
def doSomething(line: String): (Long,Long) = {
val numAs = logData.filter(line => line.contains("a")).count();
val numBs = logData.filter(line => line.contains("b")).count();
return (numAs,numBs)
}
val mapper = logData.map(doSomething _)
val save = mapper.saveAsTextFile("hdfs://localhost:9000/home/akshat/output3")
mapper is of type org.apache.spark.rdd.RDD[(Long, Long)] = MappedRDD
When I try to perform saveAsTextFile action, it gives an error
java.lang.NullPointerException
What I am doing wrong and what changes should I do to rectify this exception?
Thanks in advance!
Upvotes: 1
Views: 1080
Reputation: 13927
You should not reference logData
from within doSomething
. This is the issue. I can't tell exactly what you are trying to do, but if all you are trying to do is count the lines with "a" in them, you don't need to do the def
, just do:
val numAs = logData.filter(line => line.contains("a")).count();
val numBs = logData.filter(line => line.contains("b")).count();
If on the other hand you are trying to count "a" and "b" in each line, and write out a line for every input, then try this:
def doSomething(line: String): (Int,Int) = {
val numAs = line.count(ch => ch.equals("a"))
val numBs = line.count(ch => ch.equals("b"))
(numAs, numBs)
}
Upvotes: 5