How to get Map with matching values

I have a file with values like this :

 user id | item id | rating | timestamp
196 242 3   881250949
186 302 3   891717742
22  377 1   878887116
244 51  2   880606923
166 346 1   886397596
298 474 4   884182806
115 265 2   881171488
253 465 5   891628467
305 451 3   886324817
6   86  3   883603013
62  257 2   879372434
200 222 5   876042340
210 40  3   891035994
224 29  3   888104457
303 785 3   879485318
122 387 5   879270459
194 274 2   879539794
......

I want to find all values where item id = "560"

and make Map from rating values(1-5) like this {1->6,2-5,3-10,4-6,5-14}

object Parse {

 def main(args: Array[String]): Unit = {

    //вытаскиваем данные с u.data
    var a: List[(String, String, String, String)] = List()
    for (line <- io.Source.fromFile("F:\\big data\\u.data").getLines) {
      val newLine = line.replace("\t", ",")
      if (newLine.split(",").length < 4) {
        break
      } else {
        val asd = newLine.split(",")
        val userId = asd(0)
        val itemId = asd(1)
        val rating = asd(2)
        val timestamp = asd(3)
        a = a :+ ((userId, itemId, rating, timestamp))
      }
      a = a.filter(_._2.equals("590")) <- filter list of tuples correctly
      val empty: List[String] = a.map(_._2) <- have tyed to get list of all rating, but it does not work

    
    }
}

How can I create a map of rating? here as I can see we can generate a map of matching values Scala groupBy for a list

Upvotes: 0

Views: 269

Answers (2)

jwvh
jwvh

Reputation: 51271

If what you want is a Map of rating->count for a given "item id", this should do it.

util.Using(io.Source.fromFile("../junk.txt")) { file =>
  val rec = raw"\d+\s+590\s+(\d+)\s+\d+".r  //only this item id
  file.getLines()
      .collect { case rec(rating) => rating }
      .foldLeft(Map.empty[String, Int]) {
        case (m, r) => m + (r -> (m.getOrElse(r, 0) + 1))
      }
}.getOrElse(Map.empty[String,Int])

Note that fromFile() is automatically closed at the end of the Using block.

Upvotes: 1

Boris Azanov
Boris Azanov

Reputation: 4501

I think using for-loop is not the better decision. Please, look at your problem from the data-stream problem not array. scala.io.Source.fromFile("F:\\big data\\u.data").getLines() returns to you Iterator[String] of your lines. It is more suitable to use it as data stream not as array of data. And in your conditions is better just use combination of map, filter, collect and groupBy functions to get grouped rows by rank.

Full correct code:

val sourceFile = scala.io.Source.fromFile("F:\\big data\\u.data")
try {
  val linesOfArrays = sourceFile.getLines().map{
    line => line.split(",")
  }
  require(!linesOfArrays.exists(_.length < 4)) // your data schema validation
  val ratingCountsMap: Map[String, Int] = linesOfArrays.collect{
    case rowValuesArray if rowValuesArray(1) == "590" =>
      // in this line you will get rating and 1 for his counting
      rowValuesArray(2) -> 1
  }.toSeq
    .groupBy{ case (rating, _) => rating }
    .mapValues{ groupWithSameRating => groupWithSameRating.length }
} finally sourceFile.close()

And don't forget to release resource (in your case this is file) using close method in finally section or use scala-arm library (more about resources here)

Upvotes: 0

Related Questions