Reputation: 2646
filtered_df = filtered_df.withColumn('POINT', substring('POINT', instr(filtered_df.POINT, "#"), 30))
I need to get the first index of the # in the string and then pass that index as the substring starting position as above. What would be the way to do that?
This gives me TypeError: Column is not iterable
.
Upvotes: 2
Views: 2411
Reputation: 3676
The substring
function from pyspark.sql.functions
only takes fixed starting position and length. However your approach will work using an expression.
import pyspark.sql.functions as F
d = [{'POINT': 'The quick # brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog'},
{'POINT': 'The quick brown fox jumps over the lazy dog.# The quick brown fox jumps over the lazy dog.'}]
df = spark.createDataFrame(d)
df.withColumn('POINT', F.expr("substring(POINT, instr(POINT, '#'), 30)")).show(2, False)
+------------------------------+
|POINT |
+------------------------------+
|# brown fox jumps over the laz|
|# The quick brown fox jumps ov|
+------------------------------+
Upvotes: 4