Reputation: 307
I have a file with values like this :
user id | item id | rating | timestamp
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817
6 86 3 883603013
62 257 2 879372434
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457
303 785 3 879485318
122 387 5 879270459
194 274 2 879539794
......
I want to find all values where item id = "560"
and make Map from rating values(1-5) like this {1->6,2-5,3-10,4-6,5-14}
object Parse {
def main(args: Array[String]): Unit = {
//вытаскиваем данные с u.data
var a: List[(String, String, String, String)] = List()
for (line <- io.Source.fromFile("F:\\big data\\u.data").getLines) {
val newLine = line.replace("\t", ",")
if (newLine.split(",").length < 4) {
break
} else {
val asd = newLine.split(",")
val userId = asd(0)
val itemId = asd(1)
val rating = asd(2)
val timestamp = asd(3)
a = a :+ ((userId, itemId, rating, timestamp))
}
a = a.filter(_._2.equals("590")) <- filter list of tuples correctly
val empty: List[String] = a.map(_._2) <- have tyed to get list of all rating, but it does not work
}
}
How can I create a map of rating? here as I can see we can generate a map of matching values Scala groupBy for a list
Upvotes: 0
Views: 269
Reputation: 51271
If what you want is a Map
of rating
->count
for a given "item id", this should do it.
util.Using(io.Source.fromFile("../junk.txt")) { file =>
val rec = raw"\d+\s+590\s+(\d+)\s+\d+".r //only this item id
file.getLines()
.collect { case rec(rating) => rating }
.foldLeft(Map.empty[String, Int]) {
case (m, r) => m + (r -> (m.getOrElse(r, 0) + 1))
}
}.getOrElse(Map.empty[String,Int])
Note that fromFile()
is automatically closed at the end of the Using
block.
Upvotes: 1
Reputation: 4501
I think using for-loop
is not the better decision. Please, look at your problem from the data-stream problem not array. scala.io.Source.fromFile("F:\\big data\\u.data").getLines()
returns to you Iterator[String]
of your lines. It is more suitable to use it as data stream not as array of data. And in your conditions is better just use combination of map
, filter
, collect
and groupBy
functions to get grouped rows by rank
.
Full correct code:
val sourceFile = scala.io.Source.fromFile("F:\\big data\\u.data")
try {
val linesOfArrays = sourceFile.getLines().map{
line => line.split(",")
}
require(!linesOfArrays.exists(_.length < 4)) // your data schema validation
val ratingCountsMap: Map[String, Int] = linesOfArrays.collect{
case rowValuesArray if rowValuesArray(1) == "590" =>
// in this line you will get rating and 1 for his counting
rowValuesArray(2) -> 1
}.toSeq
.groupBy{ case (rating, _) => rating }
.mapValues{ groupWithSameRating => groupWithSameRating.length }
} finally sourceFile.close()
And don't forget to release resource (in your case this is file) using close
method in finally
section or use scala-arm library (more about resources here)
Upvotes: 0