Thomas Vulto
Thomas Vulto

Reputation: 101

Get Results From URL using Scala Spark

so I am really new to Scala and I am trying to figure out how to call a webservice and get the response back in a Json object. I am running into al kinds of problems. Likely because I am making a mistake. But I am stuck, so perhaps someone can help me.

Through some searching I found that I could define a function to call an API (actually I just found the call method code and defined a function for it)

def GetUrlContent(url: String): String ={
val result = scala.io.Source.fromURL(url).mkString
return result.toString()
}

So I call this function, and get the response into a text.

val response: String = GetUrlContent(url).toString()

A little redundant I know but I tried everything. But here I run into problems. I tried to get the whole data into a string RDD so I can look up specific lines (since I am really new and cannot map the data to Json yet). I used this statement:

 response.reduce((x,y) => x + y)

However that gave the error: Error:(22, 30) type mismatch; found : Int required: Char response.reduce((x,y) => x + y)

I tried casting x and y to Char, but that doesn't work. So as I said I probably skipped something. Can anyone explain to me why I am getting a Char array and not a String array of lines (as you would with reading a file)? And examples or solutions are always welcome.

Thanks in advance! Thomas

Upvotes: 0

Views: 6735

Answers (1)

Thomas Vulto
Thomas Vulto

Reputation: 101

Okay, so I feel I spend WAY too long on this, however I have learned a lot about Spark & Scala, so that is worth it. For everyone looking for a simple way to post a call and getting a Json DataFrame in response, I eventually made this function, which works for me. Hopefully this helps you guys further.

import org.apache.spark.sql.{DataFrame, SQLContext, SparkSession}

def GetUrlContentJson(url: String): DataFrame ={
    val result = scala.io.Source.fromURL(url).mkString
    //only one line inputs are accepted. (I tested it with a complex Json and it worked)
    val jsonResponseOneLine = result.toString().stripLineEnd 
    //You need an RDD to read it with spark.read.json! This took me some time. However it seems obvious now 
    val jsonRdd = spark.sparkContext.parallelize(jsonResponseOneLine :: Nil) 

    val jsonDf = spark.read.json(jsonRdd)
    return jsonDf
}  
val response = GetUrlContentJson(url)
response.show

Upvotes: 10

Related Questions