Reputation: 473
I have a dataframe as shown below:
---------------------+------------------------
text | featured_text
---------------------+------------------------
sun | [type, move, sun]
---------------------+------------------------
I want to search the "text" column value in "featured_text" Array and get the index of the "text" value if present. In the above example, I want to search for "sun" in Array [type, move, sun] and result will be "2" (index).
Is there any spark sql function/scala function available to get the index from the element?
Upvotes: 1
Views: 791
Reputation: 28322
As far as I know, there is no function to do this directly with the Spark SQL API. However, you can use an UDF
instead as follows (I'm assuming the input dataframe is called df
):
val getIndex = udf((text: String, featuredText: Seq[String]) => {
featuredText.indexOf(text)
})
val df2 = df.withColumn("index", getIndex($"text", $"featured_text"))
Which will give:
+----+-----------------+-----+
|text| featured_text|index|
+----+-----------------+-----+
| sun|[type, move, sun]| 2|
+----+-----------------+-----+
In the case where the value is not present the index column will have a -1.
Upvotes: 2