Reputation: 391
I have the following json format :
{"Request": {"TrancheList": {"Tranche": [{"TrancheId": "500192163","OwnedAmt": "26500000", "Curr": "USD" }, { "TrancheId": "500213369", "OwnedAmt": "41000000","Curr": "USD"}]},"FxRatesList": {"FxRatesContract": [{"Currency": "CHF","FxRate": "0.97919983706115"},{"Currency": "AUD", "FxRate": "1.2966804979253"},{ "Currency": "USD","FxRate": "1"},{"Currency": "SEK","FxRate": "8.1561012531034"},{"Currency": "NOK", "FxRate": "8.2454981641398"},{"Currency": "JPY","FxRate": "111.79999785344"},{"Currency": "HKD","FxRate": "7.7568025218916"},{"Currency": "GBP","FxRate": "0.69425159677867"}, {"Currency": "EUR","FxRate": "0.88991723769689"},{"Currency": "DKK", "FxRate": "6.629598372301"}]},"isExcludeDeals": "true","baseCurrency": "USD"}}
The json is read from hdfs :
val hdfsRequest = spark.read.json("hdfs://localhost/user/request.json")
val baseCurrency = hdfsRequest.select("Request.baseCurrency").map(_.getString(0)).collect.headOption
var fxRates = hdfsRequest.select("Request.FxRatesList.FxRatesContract")
val fxRatesDF = fxRates.select(explode(fxRates("FxRatesContract"))).toDF("FxRatesContract").select("FxRatesContract.Currency", "FxRatesContract.FxRate").filter($"Currency"===baseCurrency.get)
fxRatesDF.show()
The output that I am getting for fxRatesDF is :
fxRatesDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Currency: string, FxRate: string]
+--------+------+
|Currency|FxRate|
+--------+------+
| USD| 1|
How can I grab the value of first row of Fxrate column?
Upvotes: 14
Views: 120409
Reputation: 111
Smells like Scala! I personally like to use the Dataset API.
case class OutputFormat(Currency: String, FxRate: String)
// assuming `spark` is available
import spark.implicits._
val fxRatesDF = fxRates
.select(explode(fxRates("FxRatesContract")))
.toDF("FxRatesContract")
.select(
"FxRatesContract.Currency",
"FxRatesContract.FxRate")
.filter($"Currency" === baseCurrency.get)
.as[OutputFormat]
Now, this becomes a little easier to work with.
// grab the product from the first row. No longer dealing with `Row`s, but `OutputFormat`s
val firstRow: OutputFormat = fxRatesDF.first
val example1: String = firstRow.FxRate
// or, you can map over and grab the row (again, type-safe)
val example2: String = fxRatesDF
.map(_.FxRate) // Now, the row is just a `String`
.first
If you don't want to deal with case classes, though, you don't have to. Just specify the row as whatever datatype you're working with. Case classes are nice b/c of IDE autocompletion and column names, but not always convenient.
val fxRatesDF = fxRates
.select(explode(fxRates("FxRatesContract")))
.toDF("FxRatesContract")
.select(
"FxRatesContract.Currency",
"FxRatesContract.FxRate")
.filter($"Currency" === baseCurrency.get)
.as[(String, String)]
val (currency, fxRate): (String, String) = fxRatesDF.first
val example3 = fxRate
I personally use this last way when I write unit tests. I care less about structure at that point, and typically just prefer one-liners/less code.
Hope this helps!
Upvotes: 0
Reputation: 11
Update for the one of the answers.
from pyspark.sql.functions import col
fxRatesDF.select(col("FxRate")).first()[0]
Upvotes: 1
Reputation: 21
One simple way is to just select row and column using indexing. Input Dataframe:
+-----+
|count|
+-----+
| 0|
+-----+
Code:
count = df.collect()[0][0]
print(count)
if count == 0:
print("First row and First column value is 0")
Output:
0
First row and First column value is 0
Upvotes: 2
Reputation: 198
I know this is an old post but I got it to work this way fxRatesDF.first()[0]
Upvotes: 7
Reputation: 2681
Just a line and a word is needed to solve this requirement.
fxRates.first()(1)
or
a line with two words
fxRates.first().getString(1)
Upvotes: 1
Reputation: 355
Perhaps this way:
fxRatesDF.take(1)[0][1]
or
fxRatesDF.collect()[0][1]
or
fxRatesDF.first()[1]
Upvotes: 8
Reputation: 758
It should be as simple as:
display(fxRatesDF.select($"FxRate").limit(1))
Upvotes: 0
Reputation: 861
You can try this method:
fxRatesDF.select("FxRate").rdd.map{case Row(i:Int)=> i}.first()
Upvotes: -1
Reputation: 7732
Here is the function that you need to use
Use like this:
fxRatesDF.first().FxRate
Upvotes: 25