Arshanvit
Arshanvit

Reputation: 477

Consuming RESTful API and converting to Dataframe in Apache Spark

I am trying to convert output of url directly from RESTful api to Dataframe conversion in following way:

package trials

import org.apache.spark.sql.SparkSession
import org.json4s.jackson.JsonMethods.parse
import scala.io.Source.fromURL

object DEF {
  implicit val formats = org.json4s.DefaultFormats
  case class Result(success: Boolean,
                    message: String,
                    result: Array[Markets])
  case class Markets(
                      MarketCurrency:String,
                      BaseCurrency:String,
                      MarketCurrencyLong:String,
                      BaseCurrencyLong:String,
                      MinTradeSize:Double,
                      MarketName:String,
                      IsActive:Boolean,
                      Created:String,
                      Notice:String,
                      IsSponsored:String,
                      LogoUrl:String
                    )

  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder()
      .appName(s"${this.getClass.getSimpleName}")
      .config("spark.sql.shuffle.partitions", "4")
      .master("local[*]")
      .getOrCreate()

    import spark.implicits._
    val parsedData = parse(fromURL("https://bittrex.com/api/v1.1/public/getmarkets").mkString).extract[Array[Result]]
    val mySourceDataset = spark.createDataset(parsedData)
    mySourceDataset.printSchema
    mySourceDataset.show()
  }

}

The error is as follows and it repeats for every record:

Caused by: org.json4s.package$MappingException: Expected collection but got JObject(List((success,JBool(true)), (message,JString()), (result,JArray(List(JObject(List((MarketCurrency,JString(LTC)), (BaseCurrency,JString(BTC)), (MarketCurrencyLong,JString(Litecoin)), (BaseCurrencyLong,JString(Bitcoin)), (MinTradeSize,JDouble(0.01435906)), (MarketName,JString(BTC-LTC)), (IsActive,JBool(true)), (Created,JString(2014-02-13T00:00:00)), (Notice,JNull), (IsSponsored,JNull), (LogoUrl,JString(https://bittrexblobstorage.blob.core.windows.net/public/6defbc41-582d-47a6-bb2e-d0fa88663524.png))))))))) and mapping Result[][Result, Result] at org.json4s.reflect.package$.fail(package.scala:96)

Upvotes: 1

Views: 5413

Answers (1)

Antot
Antot

Reputation: 3964

The structure of the JSON returned from this URL is:

{
  "success": boolean,
  "message": string,
  "result": [ ... ]
}

So Result class should be aligned with this structure:

case class Result(success: Boolean,
                  message: String,
                  result: List[Markets])

Update And I also refined slightly the Markets class:

case class Markets(
                    MarketCurrency: String,
                    BaseCurrency: String,
                    MarketCurrencyLong: String,
                    BaseCurrencyLong: String,
                    MinTradeSize: Double,
                    MarketName: String,
                    IsActive: Boolean,
                    Created: String,
                    Notice: Option[String],
                    IsSponsored: Option[Boolean],
                    LogoUrl: String
                  )

End-of-update

But the main issue is in the extraction of the main data part from the parsed JSON:

val parsedData = parse(fromURL("{url}").mkString).extract[Array[Result]]

The root of the returned structure is not an array, but corresponds to Result. So it should be:

val parsedData = parse(fromURL("{url}").mkString).extract[Result]

Then, I suppose that you need not to load the wrapper in the DataFrame, but rather the Markets that are inside. That is why it should be loaded like this:

val mySourceDataset = spark.createDataset(parsedData.result)

And it finally produces the DataFrame:

+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
|MarketCurrency|BaseCurrency|MarketCurrencyLong|BaseCurrencyLong|MinTradeSize|MarketName|IsActive|            Created|Notice|IsSponsored|             LogoUrl|
+--------------+------------+------------------+----------------+------------+----------+--------+-------------------+------+-----------+--------------------+
|           LTC|         BTC|          Litecoin|         Bitcoin|  0.01435906|   BTC-LTC|    true|2014-02-13T00:00:00|  null|       null|https://bittrexbl...|
|          DOGE|         BTC|          Dogecoin|         Bitcoin|396.82539683|  BTC-DOGE|    true|2014-02-13T00:00:00|  null|       null|https://bittrexbl...|

Upvotes: 3

Related Questions