Reputation: 417
I am new to spark SQL,
In MS SQL, we have LEFT keyword, LEFT(Columnname,1) in('D','A') then 1 else 0
.
How to implement the same in SPARK SQL.
Upvotes: 10
Views: 32463
Reputation: 19308
There are Spark SQL right
and left
functions as of Spark 2.3
Suppose you have the following DataFrame.
+----------------------+
|some_string |
+----------------------+
|this 23 has 44 numbers|
|no numbers |
|null |
+----------------------+
Here's how to get the leftmost two elements using the SQL left
function:
df.select(expr("left(some_string, 2)").as("left_two")).show(false)
+--------+
|left_two|
+--------+
|th |
|no |
|null |
+--------+
Passing in SQL strings to expr()
isn't ideal. Scala API users don't want to deal with SQL string formatting.
I created a library called bebe that provides easy access to the left function:
df.select(bebe_left(col("some_string"), lit(2)).as("left_two")).show()
+--------+
|left_two|
+--------+
|th |
|no |
|null |
+--------+
The Spark SQL right
and bebe_right
functions work in a similar manner.
You can use the Spark SQL functions with the expr
hack, but it's better to use the bebe functions that are more flexible and type safe.
Upvotes: 0
Reputation: 330113
You can use substring
function with positive pos
to take from the left:
import org.apache.spark.sql.functions.substring
substring(column, 0, 1)
and negative pos
to take from the right:
substring(column, -1, 1)
So in Scala you can define
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.substring
def left(col: Column, n: Int) = {
assert(n >= 0)
substring(col, 0, n)
}
def right(col: Column, n: Int) = {
assert(n >= 0)
substring(col, -n, n)
}
val df = Seq("foobar").toDF("str")
df.select(
Seq(left _, right _).flatMap(f => (1 to 3).map(i => f($"str", i))): _*
).show
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
|substring(str, 0, 1)|substring(str, 0, 2)|substring(str, 0, 3)|substring(str, -1, 1)|substring(str, -2, 2)|substring(str, -3, 3)|
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
| f| fo| foo| r| ar| bar|
+--------------------+--------------------+--------------------+---------------------+---------------------+---------------------+
Similarly in Python:
from pyspark.sql.functions import substring
from pyspark.sql.column import Column
def left(col, n):
assert isinstance(col, (Column, str))
assert isinstance(n, int) and n >= 0
return substring(col, 0, n)
def right(col, n):
assert isinstance(col, (Column, str))
assert isinstance(n, int) and n >= 0
return substring(col, -n, n)
Upvotes: 16
Reputation: 8513
To build upon user6910411's answer, you can also use isin and then to build a new column with the result of your character comparison.
Final full code would look something like this
import org.apache.spark.sql.functions._
df.select(substring($"Columnname", 0, 1) as "ch")
.withColumn("result", when($"ch".isin("D", "A"), 1).otherwise(0))
Upvotes: 4
Reputation: 784
import org.apache.spark.sql.functions._
Use substring(column, 0, 1)
instead of LEFT
function.
where
Example : Consider a LEFT function :
LEFT(upper(SKU),2)
Corresponding SparkSQL statement would be :
substring(upper(SKU),1,2)
Upvotes: 5