Reputation: 391

Getting the first value from spark.sql.Row

I have the following json format :

{"Request": {"TrancheList": {"Tranche": [{"TrancheId": "500192163","OwnedAmt": "26500000",    "Curr": "USD" }, {  "TrancheId": "500213369", "OwnedAmt": "41000000","Curr": "USD"}]},"FxRatesList": {"FxRatesContract": [{"Currency": "CHF","FxRate": "0.97919983706115"},{"Currency": "AUD", "FxRate": "1.2966804979253"},{ "Currency": "USD","FxRate": "1"},{"Currency": "SEK","FxRate": "8.1561012531034"},{"Currency": "NOK", "FxRate": "8.2454981641398"},{"Currency": "JPY","FxRate": "111.79999785344"},{"Currency": "HKD","FxRate": "7.7568025218916"},{"Currency": "GBP","FxRate": "0.69425159677867"}, {"Currency": "EUR","FxRate": "0.88991723769689"},{"Currency": "DKK", "FxRate": "6.629598372301"}]},"isExcludeDeals": "true","baseCurrency": "USD"}}

The json is read from hdfs :

val hdfsRequest = spark.read.json("hdfs://localhost/user/request.json")
val baseCurrency = hdfsRequest.select("Request.baseCurrency").map(_.getString(0)).collect.headOption
var fxRates = hdfsRequest.select("Request.FxRatesList.FxRatesContract")
val fxRatesDF = fxRates.select(explode(fxRates("FxRatesContract"))).toDF("FxRatesContract").select("FxRatesContract.Currency", "FxRatesContract.FxRate").filter($"Currency"===baseCurrency.get)
fxRatesDF.show()

The output that I am getting for fxRatesDF is :

fxRatesDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Currency: string, FxRate: string]
+--------+------+
|Currency|FxRate|
+--------+------+
|     USD|     1|

How can I grab the value of first row of Fxrate column?

Upvotes: 14

Answers (10)

t.perk

Reputation: 111

Smells like Scala! I personally like to use the Dataset API.

case class OutputFormat(Currency: String, FxRate: String)

// assuming `spark` is available
import spark.implicits._

val fxRatesDF = fxRates
  .select(explode(fxRates("FxRatesContract")))
  .toDF("FxRatesContract")
  .select(
    "FxRatesContract.Currency", 
    "FxRatesContract.FxRate")
  .filter($"Currency" === baseCurrency.get)
  .as[OutputFormat]

Now, this becomes a little easier to work with.

// grab the product from the first row. No longer dealing with `Row`s, but `OutputFormat`s
val firstRow: OutputFormat = fxRatesDF.first
val example1: String = firstRow.FxRate

// or, you can map over and grab the row (again, type-safe)
val example2: String = fxRatesDF
  .map(_.FxRate) // Now, the row is just a `String`
  .first

If you don't want to deal with case classes, though, you don't have to. Just specify the row as whatever datatype you're working with. Case classes are nice b/c of IDE autocompletion and column names, but not always convenient.

val fxRatesDF = fxRates
  .select(explode(fxRates("FxRatesContract")))
  .toDF("FxRatesContract")
  .select(
    "FxRatesContract.Currency", 
    "FxRatesContract.FxRate")
  .filter($"Currency" === baseCurrency.get)
  .as[(String, String)]

val (currency, fxRate): (String, String) = fxRatesDF.first
val example3 = fxRate

I personally use this last way when I write unit tests. I care less about structure at that point, and typically just prefer one-liners/less code.

Hope this helps!

Upvotes: 0

abaghel

Reputation: 15297

You can use

fxRatesDF.select(col("FxRate")).first().FxRate

Upvotes: 33

Avj_10

Reputation: 11

Update for the one of the answers.

from pyspark.sql.functions import col
fxRatesDF.select(col("FxRate")).first()[0]

Upvotes: 1

Ashish Dudeja

Reputation: 21

One simple way is to just select row and column using indexing. Input Dataframe:

+-----+
|count|
+-----+
|    0|
+-----+

Code:

count = df.collect()[0][0]
print(count)
if count == 0:
    print("First row and First column value is 0")

Output:

0
First row and First column value is 0

Upvotes: 2

DataGuy

Reputation: 198

I know this is an old post but I got it to work this way fxRatesDF.first()[0]

Upvotes: 7

sumitya

Reputation: 2681

Just a line and a word is needed to solve this requirement.

fxRates.first()(1)

a line with two words

fxRates.first().getString(1)

Upvotes: 1

Val

Reputation: 355

Perhaps this way:

fxRatesDF.take(1)[0][1]

fxRatesDF.collect()[0][1]

fxRatesDF.first()[1]

Upvotes: 8

Chondrops

Reputation: 758

It should be as simple as:

display(fxRatesDF.select($"FxRate").limit(1))

Upvotes: 0

Pi Pi

Reputation: 861

You can try this method:

fxRatesDF.select("FxRate").rdd.map{case Row(i:Int)=> i}.first()

Upvotes: -1

Thiago Baldim

Reputation: 7732

Here is the function that you need to use

Use like this:

fxRatesDF.first().FxRate

Upvotes: 25

Getting the first value from spark.sql.Row

Answers (10)

Related Questions