Dil
Dil

Reputation: 417

Get the last element from Apache Spark SQL split() Function

I want to get the last element from the Array that return from Spark SQL split() function.

split(4:3-2:3-5:4-6:4-5:2,'-')

I know it can get by

split(4:3-2:3-5:4-6:4-5:2,'-')[4]

But i want another way when i don't know the length of the Array . please help me.

Upvotes: 10

Views: 21470

Answers (4)

AjeetGiri
AjeetGiri

Reputation: 11

use split(MY_COLUMN,'-').getItem(0) if you are using Java

Upvotes: 0

Valentin
Valentin

Reputation: 365

Please check substring_index it should work exactly as you want:

substring_index(lit("1-2-3-4"), "-", -1) // 4

Upvotes: 12

Mahdi Shahbaba
Mahdi Shahbaba

Reputation: 499

You can also use SparkSql Reverse() function on a column after Split(). For example:

SELECT reverse(split(MY_COLUMN,'-'))[0] FROM MY_TABLE

Here [0] gives you the first element of the reversed array, which is the last element of the initial array.

Upvotes: 24

pheeleeppoo
pheeleeppoo

Reputation: 1525

You could use an UDF to do that, as follow:

val df = sc.parallelize(Seq((1L,"one-last1"), (2L,"two-last2"), (3L,"three-last3"))).toDF("key","Value")
+---+-----------+
|key|Value      |
+---+-----------+
|1  |one-last1  |
|2  |two-last2  |
|3  |three-last3|
+---+-----------+

val get_last = udf((xs: Seq[String]) => Try(xs.last).toOption)

val with_just_last = df.withColumn("Last" , get_last(split(col("Value"), "-")))
+---+-----------+--------+
|key|Value      |Last    |
+---+-----------+--------+
|1  |one-last1  |last1   |
|2  |two-last2  |last2   |
|3  |three-last3|last3   |
+---+-----------+--------+

Remember that the split function from SparkSQL can be applied to a column of the DataFrame.

Upvotes: 3

Related Questions