Narayanan
Narayanan

Reputation: 29

How to retrieve a column value based on another column value in to a variable

I'm new to scala programming. I have a usecase to retrieve a column value in to a variable based on another column value in a dataframe

This is on scala.

I have the following data frame

Data Frame table

I need to get the value of the column location in to a variable based on column name passed in. i.e. if the passed in name is 'xxx' I need the value 'India' in to a variable from the data frame.

Upvotes: 0

Views: 1491

Answers (3)

gccodec
gccodec

Reputation: 342

If I really understand what you mean it's just a filter and select the corresponding value of location. The follow code are an example

import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.DataTypes._
import org.apache.spark.sql.types.{StructField, StructType}
import org.apache.spark.sql.functions.col
import org.scalatest.FunSuite

class FilterTest extends FunSuite {

  test("filter test") {

    val spark = SparkSession.builder()
      .master("local")
      .appName("filter test")
      .getOrCreate()

    val schema = StructType(
      Seq(
        StructField("name", StringType, true),
        StructField("age", IntegerType, true),
        StructField("location", StringType, true)
      )
    )

    val data = Seq(
      Row("XXX", 34, "India"),
      Row("YYY", 42, "China"),
      Row("ZZZ", 36, "America")
    )

    val dataset = spark.createDataset(data)(RowEncoder(schema))
    val value = dataset.filter(col("name") === "XXX").first().getAs[String]("location")
    assert(value == "India")
  }
}

Upvotes: 1

Rishi Saraf
Rishi Saraf

Reputation: 1812

You can use filter to get row where column name value is xxx. Once you have row you can display any column of that row.

var filteredRows = dataFrame.filter(row => {
    row.get(0).equals("XXX")
})
filteredRows.rdd.first().get(2)

Upvotes: 0

Md Shihab Uddin
Md Shihab Uddin

Reputation: 561

Assuming, the value that is passed is unique to the dataframe otherwise multiple rows will be returned and you've to handle other way. Here is the way how you can solve it:

scala> import spark.implicits._
import spark.implicits._

scala> val df = Seq(("XXX",34, "India"), ("YYY", 42, "China"), ("ZZZ", 36, "America")).toDF("name", "age", "location")
scala> df.show()
+----+---+--------+
|name|age|location|
+----+---+--------+
| XXX| 34|   India|
| YYY| 42|   China|
| ZZZ| 36| America|
+----+---+--------+
scala> val input = "XXX"
input: String = XXX
scala> val location = df.filter(s"name = '$input'").select("location").collect()(0).getString(0)
location: String = India

Hopefully that will solve your requirement....

Upvotes: 0

Related Questions