Mohan
Mohan

Reputation: 473

Get Seq index from element in Spark Sql

I have a dataframe as shown below:

---------------------+------------------------
text                 | featured_text
---------------------+------------------------
sun                  | [type, move, sun]
---------------------+------------------------

I want to search the "text" column value in "featured_text" Array and get the index of the "text" value if present. In the above example, I want to search for "sun" in Array [type, move, sun] and result will be "2" (index).

Is there any spark sql function/scala function available to get the index from the element?

Upvotes: 1

Views: 791

Answers (1)

Shaido
Shaido

Reputation: 28322

As far as I know, there is no function to do this directly with the Spark SQL API. However, you can use an UDF instead as follows (I'm assuming the input dataframe is called df):

val getIndex = udf((text: String, featuredText: Seq[String]) => {
  featuredText.indexOf(text)
})

val df2 = df.withColumn("index", getIndex($"text", $"featured_text"))

Which will give:

+----+-----------------+-----+
|text|    featured_text|index|
+----+-----------------+-----+
| sun|[type, move, sun]|    2|
+----+-----------------+-----+

In the case where the value is not present the index column will have a -1.

Upvotes: 2

Related Questions