Reputation: 860
df = spark.createDataFrame(
[
(1, "AxtTR"), # create your data here, be consistent in the types.
(2, "HdyOP"),
(3, "EqoPIC"),
(4, "OkTEic"),
], ["id", "label"] )# add your column names here]
df.show()
Below code is in python , where i use apply function and tried extracting first 2 letters of every row. i want to replicate the same code in pyspark. where a function is used to apply on every single row and get the output.
def get_string(lst):
lst = str(lst)
lst = lst.lower
lst= lst[0:2]
return(lst)
df['firt_2letter'] = df['label'].apply(get_string)
The yellow marked as shown in below image is the expected output.
Upvotes: 0
Views: 395
Reputation: 42422
You can use the relevant Spark SQL functions:
import pyspark.sql.functions as F
df2 = df.withColumn('first_2letter', F.lower('label')[0:2])
df2.show()
+---+------+-------------+
| id| label|first_2letter|
+---+------+-------------+
| 1| AxtTR| ax|
| 2| HdyOP| hd|
| 3|EqoPIC| eq|
| 4|OkTEic| ok|
+---+------+-------------+
If you want to use user-defined functions, you can define them as:
def get_string(lst):
lst = str(lst)
lst = lst.lower()
lst = lst[0:2]
return lst
import pyspark.sql.functions as F
df2 = df.withColumn('first_2letter', F.udf(get_string)('label'))
df2.show()
+---+------+-------------+
| id| label|first_2letter|
+---+------+-------------+
| 1| AxtTR| ax|
| 2| HdyOP| hd|
| 3|EqoPIC| eq|
| 4|OkTEic| ok|
+---+------+-------------+
Upvotes: 1