Reputation: 21
I'm new to Scala and I cannot find out what is causing this error, I have searched similar topics but unfortunately, none of them worked for me. I've got a simple code to find the line from some README.md file with the most words in it. The code I wrote is:
val readme = sc.textFile("/PATH/TO/README.md")
readme.map(lambda line :len(line.split())).reduce(lambda a, b: a if (a > b) else b)
and the error is:
Name: Compile Error
Message: <console>:1: error: ')' expected but '(' found.
readme.map(lambda line :len(line.split()) ).reduce( lambda a, b: a
if (a > b) else b ) ^
<console>:1: error: ';' expected but ')' found.
readme.map(lambda line :len(line.split()) ).reduce( lambda a, b: a
if (a > b) else b ) ^
Upvotes: 0
Views: 7113
Reputation: 8279
Your code isn't valid Scala.
I think what you might be trying to do is to determine the largest number of words on a single line in a README file using Spark. Is that right? If so, then you likely want something like this:
val readme = sc.textFile("/PATH/TO/README.md")
readme.map(_.split(' ').length).reduce(Math.max)
That last line uses some argument abbreviations. This alternative version is equivalent, but a little more explicit:
readme.map(line => line.split(' ').length).reduce((a, b) => Math.max(a, b))
The map
function converts an RDD
of String
s (each line in the file) into an RDD
of Int
s (the number of words on a single line, delimited - in this particular case - by spaces). The reduce
function then returns the largest value of its two arguments - which will ultimately result in a single Int
value representing the largest number of elements on a single line of the file.
After re-reading your question, it seems that you might want to know the line with the most words, rather than how many words are present. That's a little trickier, but this should do the trick:
readme.map(line => (line.split(' ').length, line)).reduce((a, b) => if(a._1 > b._1) a else b)._2
Now map
creates an RDD
of a tuple of (Int, String)
, where the first value is the number of words on the line, and the second is the line itself. reduce
then retains whichever of its two tuple arguments has the larger integer value (._1
refers to the first element of the tuple). Since the result is a tuple, we then use ._2
to retrieve the corresponding line (the second element of the tuple).
I'd recommend you read a good book on Scala, such as Programming in Scala, 3rd Edition, by Odersky, Spoon & Venners. There's also some tutorials and an overview of the language on the main Scala language site. Coursera also has some free Scala training courses that you might want to sign up for.
Upvotes: 4