Miruthan
Miruthan

Reputation: 417

How to use LEFT and RIGHT keyword in SPARK SQL

I am new to spark SQL,

In MS SQL, we have LEFT keyword, LEFT(Columnname,1) in('D','A') then 1 else 0.

How to implement the same in SPARK SQL.

Upvotes: 10

Views: 32463

Answers (4)

Powers
Powers

Reputation: 19308

There are Spark SQL right and left functions as of Spark 2.3

Suppose you have the following DataFrame.

+----------------------+
|some_string           |
+----------------------+
|this 23 has 44 numbers|
|no numbers            |
|null                  |
+----------------------+

Here's how to get the leftmost two elements using the SQL left function:

df.select(expr("left(some_string, 2)").as("left_two")).show(false)
+--------+
|left_two|
+--------+
|th      |
|no      |
|null    |
+--------+

Passing in SQL strings to expr() isn't ideal. Scala API users don't want to deal with SQL string formatting.

I created a library called bebe that provides easy access to the left function:

df.select(bebe_left(col("some_string"), lit(2)).as("left_two")).show()
+--------+
|left_two|
+--------+
|th      |
|no      |
|null    |
+--------+

The Spark SQL right and bebe_right functions work in a similar manner.

You can use the Spark SQL functions with the expr hack, but it's better to use the bebe functions that are more flexible and type safe.

Upvotes: 0

zero323
zero323

Reputation: 330113

You can use substring function with positive pos to take from the left:

import org.apache.spark.sql.functions.substring

substring(column, 0, 1)

and negative pos to take from the right:

substring(column, -1, 1)

So in Scala you can define

import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.substring

def left(col: Column, n: Int) = {
  assert(n >= 0)
  substring(col, 0, n)
}

def right(col: Column, n: Int) = {
  assert(n >= 0)
  substring(col, -n, n)
}

val df = Seq("foobar").toDF("str")

df.select(
  Seq(left _, right _).flatMap(f => (1 to 3).map(i => f($"str", i))): _*
).show
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
|substring(str, 0, 1)|substring(str, 0, 2)|substring(str, 0, 3)|substring(str, -1, 1)|substring(str, -2, 2)|substring(str, -3, 3)|
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
|                   f|                  fo|                 foo|                    r|                   ar|                  bar|
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+

Similarly in Python:

from pyspark.sql.functions import substring
from pyspark.sql.column import Column

def left(col, n):
    assert isinstance(col, (Column, str))
    assert isinstance(n, int) and n >= 0
    return substring(col, 0, n)

def right(col, n):
    assert isinstance(col, (Column, str))
    assert isinstance(n, int) and n >= 0
    return substring(col, -n, n)

Upvotes: 16

Ryan Widmaier
Ryan Widmaier

Reputation: 8513

To build upon user6910411's answer, you can also use isin and then to build a new column with the result of your character comparison.

Final full code would look something like this

import org.apache.spark.sql.functions._

df.select(substring($"Columnname", 0, 1) as "ch")
    .withColumn("result", when($"ch".isin("D", "A"), 1).otherwise(0))

Upvotes: 4

Nagesh Singh Chauhan
Nagesh Singh Chauhan

Reputation: 784

import org.apache.spark.sql.functions._  

Use substring(column, 0, 1) instead of LEFT function.

where

  • 0 : starting position in the string
  • 1 : Number of characters to be selected

Example : Consider a LEFT function :

LEFT(upper(SKU),2)

Corresponding SparkSQL statement would be :

substring(upper(SKU),1,2) 

Upvotes: 5

Related Questions