Spark- Text File to (String, String)

Question

I have a text file which has two tab separated "columns"

JapanShinjuku
AustraliaMelbourne
United States of AmericaNew York
AustraliaCanberra
AustraliaSydney
JapanTokyo

I read this file into an RDD and perform the following operation

val myFile = sc.textFile("/user/abc/textfile.txt")
myFile.map(str => str.split("	")).collect()

which results in

Array[Array[String]] = Array(Array(Japan,Tokyo), Array(United States of America,Washington DC), Array(Australia,Canberra))

But what I want is not Array[Array[String]] but Array[(String, String)], so I tried the following

myFile.map(str => str.split("	")).map(arr => (arr[0], arr[1])).collect

And got the following error

:1: error: identifier expected but integer literal found.
   myFile.map(str => str.split("	")).map(arr => (arr[0], arr[1])).collect
                                                     ^

Could anyone help me with this? What I want is a list of (country, city) so I can perform the following operation

ListThatIWant(Country, City)
    .map(a => (a._1, 1))
        .reduceByKey(_+_)
            .reducebyKey((a, b) => if(a>b) a else b)

This would give me the country that has the most number of cities in the text filealong with the number of cities/ occurrences in said file.

Ramesh Maharjan · Accepted Answer

In scala unlike java, elements of array is accessed using () not [] So the correct way is

myFile.map(str => str.split("	")).map(arr => (arr(0), arr(1))).collect

Spark- Text File to (String, String)

Answers (2)

Related Questions