Looping through Map Spark Scala

Question

Within this code we have two files: athletes.csv that contains names, and twitter.test that contains the tweet message. We want to find name for every single line in the twitter.test that match the name in athletes.csv We applied map function to store the name from athletes.csv and want to iterate all of the name to all of the line in the test file.

object twitterAthlete {

  def loadAthleteNames() : Map[String, String] = {

    // Handle character encoding issues:
    implicit val codec = Codec("UTF-8")
    codec.onMalformedInput(CodingErrorAction.REPLACE)
    codec.onUnmappableCharacter(CodingErrorAction.REPLACE)

    // Create a Map of Ints to Strings, and populate it from u.item.
    var athleteInfo:Map[String, String] = Map()
    //var movieNames:Map[Int, String] = Map() 
     val lines = Source.fromFile("../athletes.csv").getLines()
     for (line <- lines) {
       var fields = line.split(',')
       if (fields.length > 1) {
        athleteInfo += (fields(1) -> fields(7))
       }
     }

     return athleteInfo
  }

  def parseLine(line:String): (String)= {
    var athleteInfo = loadAthleteNames()
    var hello = new String
    for((k,v) <- athleteInfo){
      if(line.toString().contains(k)){
        hello = k
      }
    }
    return (hello)
  }


  def main(args: Array[String]){
    Logger.getLogger("org").setLevel(Level.ERROR)

    val sc = new SparkContext("local[*]", "twitterAthlete")

    val lines = sc.textFile("../twitter.test")
    var athleteInfo = loadAthleteNames()

    val splitting = lines.map(x => x.split(";")).map(x => if(x.length == 4 && x(2).length <= 140)x(2)) 

    var hello = new String()
    val container = splitting.map(x => for((key,value) <- athleteInfo)if(x.toString().contains(key)){key}).cache


    container.collect().foreach(println)  

   // val mapping = container.map(x => (x,1)).reduceByKey(_+_)
    //mapping.collect().foreach(println)
  }
}

the first file look like:

id,name,nationality,sex,height........  
001,Michael,USA,male,1.96 ...
002,Json,GBR,male,1.76 ....
003,Martin,female,1.73 . ...

the second file look likes:

time, id , tweet .....
12:00, 03043, some message that contain some athletes names  , .....
02:00, 03023, some message that contain some athletes names , .....

some thinks like this ...

but i got empty result after running this code, any suggestions is much appreciated

result i got is empty :

()....
()...
()...

but the result that i expected something like:

(name,1)
(other name,1)

edkeveked · Accepted Answer

You need to use yield to return value to your map

 val container = splitting.map(x => for((key,value) <- athleteInfo ; if(x.toString().contains(key)) ) yield (key, 1)).cache

Looping through Map Spark Scala

Answers (2)

Related Questions