SparkSQL : How to select column value on the basis of a column name

Question

I am working with a dataframe with the following schema:

root
 |-- Id: integer (nullable = true)
 |-- defectiveItem: string (nullable = true)
 |-- item: struct (nullable = true)
 |    |-- gem1: integer (nullable = true)
 |    |-- gem2: integer (nullable = true)
 |    |-- gem3: integer (nullable = true)

defectiveItem column contains a value in gem1,gem2,gem3 and item contains the count for the items. Now depending on the defectiveItem, I need to project count of the given defectiveItem from item as a new column named count.

For example if the defectiveItem column contains gem1 and item contains {"gem1":3,"gem2":4,"gem3":5} the resulting count column should contain 3.

The resulting schema should be as follows:

root
     |-- Id: integer (nullable = true)
     |-- defectiveItem: string (nullable = true)
     |-- item: struct (nullable = true)
     |    |-- gem1: integer (nullable = true)
     |    |-- gem2: integer (nullable = true)
     |    |-- gem3: integer (nullable = true)
     |-- count: integer (nullable = true)

Ramesh Maharjan · Accepted Answer

You can get your desired output dataframe by using a udf function as

import org.apache.spark.sql.functions._
def getItemUdf = udf((defectItem: String, item: Row)=> item.getAs[Int](defectItem))

df.withColumn("count", getItemUdf(col("defectiveItem"), col("item"))).show(false)

I hope the answer is useful

SparkSQL : How to select column value on the basis of a column name

Answers (2)

Related Questions