Reputation: 29
I'm new to scala programming. I have a usecase to retrieve a column value in to a variable based on another column value in a dataframe
This is on scala.
I have the following data frame
I need to get the value of the column location in to a variable based on column name passed in. i.e. if the passed in name is 'xxx' I need the value 'India' in to a variable from the data frame.
Upvotes: 0
Views: 1491
Reputation: 342
If I really understand what you mean it's just a filter and select the corresponding value of location. The follow code are an example
import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.DataTypes._
import org.apache.spark.sql.types.{StructField, StructType}
import org.apache.spark.sql.functions.col
import org.scalatest.FunSuite
class FilterTest extends FunSuite {
test("filter test") {
val spark = SparkSession.builder()
.master("local")
.appName("filter test")
.getOrCreate()
val schema = StructType(
Seq(
StructField("name", StringType, true),
StructField("age", IntegerType, true),
StructField("location", StringType, true)
)
)
val data = Seq(
Row("XXX", 34, "India"),
Row("YYY", 42, "China"),
Row("ZZZ", 36, "America")
)
val dataset = spark.createDataset(data)(RowEncoder(schema))
val value = dataset.filter(col("name") === "XXX").first().getAs[String]("location")
assert(value == "India")
}
}
Upvotes: 1
Reputation: 1812
You can use filter to get row where column name value is xxx. Once you have row you can display any column of that row.
var filteredRows = dataFrame.filter(row => {
row.get(0).equals("XXX")
})
filteredRows.rdd.first().get(2)
Upvotes: 0
Reputation: 561
Assuming, the value that is passed is unique to the dataframe otherwise multiple rows will be returned and you've to handle other way. Here is the way how you can solve it:
scala> import spark.implicits._
import spark.implicits._
scala> val df = Seq(("XXX",34, "India"), ("YYY", 42, "China"), ("ZZZ", 36, "America")).toDF("name", "age", "location")
scala> df.show()
+----+---+--------+
|name|age|location|
+----+---+--------+
| XXX| 34| India|
| YYY| 42| China|
| ZZZ| 36| America|
+----+---+--------+
scala> val input = "XXX"
input: String = XXX
scala> val location = df.filter(s"name = '$input'").select("location").collect()(0).getString(0)
location: String = India
Hopefully that will solve your requirement....
Upvotes: 0