ghostMutt
ghostMutt

Reputation: 73

Scala, finding max value in arrays

First time I've had to ask a question here, there is not enough info on Scala out there for a newbie like me.

Basically what I have is a file filled with hundreds of thousands of lists formatted like this:

(type, date, count, object)

Rows look something like this:

(food, 30052014, 400, banana)

(food, 30052014, 2, pizza)

All I need to is find the one row with the highest count.

I know I did this a couple of months ago but can't seem to wrap my head around it now. I'm sure I can do this without a function too. All I want to do is set a value and put that row in it but I can't figure it out.

I think basically what I want to do is a Math.max on the 3rd element in the lists, but I just can't get it.

Any help will be kindly appreciated. Sorry if my wording or formatting of this question isn't the best.

EDIT: There's some extra info I've left out that I should probably add:

All the records are stored in a tsv file. I've done this to split them:

val split_food = food.map(_.split("/t"))

so basically I think I need to use split_food... somehow

Upvotes: 0

Views: 4836

Answers (4)

user3657361
user3657361

Reputation: 87

Not sure if you got the answer yet but I had the same issues with maxBy. I found once I ran the package... import scala.io.Source I was able to use maxBy and it worked.

Upvotes: 0

Szymon
Szymon

Reputation: 306

You should use maxBy function:

case class Purchase(category: String, date: Long, count: Int, name: String)

object Purchase {
  def apply(s: String) = s.split("\t") match {
    case Seq(cat, date, count, name) => Purchase(cat, date.toLong, count.toInt, name)
  }
}

foodRows.map(row => Purchase(row)).maxBy(_.count)

Upvotes: 2

om-nom-nom
om-nom-nom

Reputation: 62835

Modified version of @Szymon answer with your edit addressed:

val split_food = food.map(_.split("/t"))
val max_food = split_food.maxBy(tokens => tokens(2).toInt) 

or, analogously:

val max_food = split_food.maxBy { case Array(_, _, count, _) => count.toInt }

In case you're using apache spark's RDD, which has limited number of usual scala collections methods, you have to go with reduce

val max_food = split_food.reduce { (max: Array[String], current: Array[String]) =>
   val curCount = current(2).toInt
   val maxCount = max(2).toInt // you probably would want to preprocess all items, 
                               // so .toInt will not be called again and again 
   if (curCount > maxCount) current else max 
}

Upvotes: 5

Lord of the Goo
Lord of the Goo

Reputation: 1287

Simply:

case class Record(food:String, date:String, count:Int)
val l = List(Record("ciccio", "x", 1), Record("buffo", "y", 4), Record("banana", "z", 3))
l.maxBy(_.count)

>>> res8: Record = Record(buffo,y,4)

Upvotes: 0

Related Questions