Reputation: 73
First time I've had to ask a question here, there is not enough info on Scala out there for a newbie like me.
Basically what I have is a file filled with hundreds of thousands of lists formatted like this:
(type, date, count, object)
Rows look something like this:
(food, 30052014, 400, banana)
(food, 30052014, 2, pizza)
All I need to is find the one row with the highest count.
I know I did this a couple of months ago but can't seem to wrap my head around it now. I'm sure I can do this without a function too. All I want to do is set a value and put that row in it but I can't figure it out.
I think basically what I want to do is a Math.max on the 3rd element in the lists, but I just can't get it.
Any help will be kindly appreciated. Sorry if my wording or formatting of this question isn't the best.
EDIT: There's some extra info I've left out that I should probably add:
All the records are stored in a tsv file. I've done this to split them:
val split_food = food.map(_.split("/t"))
so basically I think I need to use split_food... somehow
Upvotes: 0
Views: 4836
Reputation: 87
Not sure if you got the answer yet but I had the same issues with maxBy. I found once I ran the package... import scala.io.Source
I was able to use maxBy and it worked.
Upvotes: 0
Reputation: 306
You should use maxBy
function:
case class Purchase(category: String, date: Long, count: Int, name: String)
object Purchase {
def apply(s: String) = s.split("\t") match {
case Seq(cat, date, count, name) => Purchase(cat, date.toLong, count.toInt, name)
}
}
foodRows.map(row => Purchase(row)).maxBy(_.count)
Upvotes: 2
Reputation: 62835
Modified version of @Szymon answer with your edit addressed:
val split_food = food.map(_.split("/t"))
val max_food = split_food.maxBy(tokens => tokens(2).toInt)
or, analogously:
val max_food = split_food.maxBy { case Array(_, _, count, _) => count.toInt }
In case you're using apache spark's RDD, which has limited number of usual scala collections methods, you have to go with reduce
val max_food = split_food.reduce { (max: Array[String], current: Array[String]) =>
val curCount = current(2).toInt
val maxCount = max(2).toInt // you probably would want to preprocess all items,
// so .toInt will not be called again and again
if (curCount > maxCount) current else max
}
Upvotes: 5
Reputation: 1287
Simply:
case class Record(food:String, date:String, count:Int)
val l = List(Record("ciccio", "x", 1), Record("buffo", "y", 4), Record("banana", "z", 3))
l.maxBy(_.count)
>>> res8: Record = Record(buffo,y,4)
Upvotes: 0