Reputation: 1022
I have a data frame like this below with just one column and one row, I want to create a function that replaces the matched string with some text.
df2 = pd.DataFrame([['_text1']],columns = ['my_texts'])
spark_df = spark.createDataFrame(df2)
def text_func(df, col):
return df.withColumn("origin_code", sparkSqlFunctions.when("my_texts".startswith('_text1') == True, 'text_passed')
.otherwise("my_texts"))
this function somehow doesn't work and gives me an error "Data Frame object has no attribute 'text_func' ".
i am calling it like this
final = spark_df.withColumn("my_texts", text_func(spark_df, "my_texts"))
it's probably wrong.
Can anyone help me with this?
Upvotes: 1
Views: 227
Reputation: 3817
In your function, change
"my_texts".startswith('_text1')
to
sparkSqlFunctions.col("my_texts").startswith('_text1')
This might be the reason for confusion. It works for me.
By this modification I explicitly tell spark that the first "my_texts"
is a column, not a string.
You also can remove ==True
from the code.
Upvotes: 1
Reputation: 1669
You can try this:
from pyspark.sql import SparkSession, SQLContext, Column
from pyspark.sql import functions as F
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
import pandas as pd
spark = SparkSession.builder.appName('test').getOrCreate()
df2 = pd.DataFrame([['_text1']],columns = ['my_texts'])
spark_df = spark.createDataFrame(df2)
spark_df.show()
text_func = udf(lambda my_texts: "text_passed" if my_texts.startswith('_text1') == True else my_texts, StringType())
df = spark_df.withColumn('my_texts', text_func(spark_df['my_texts']))
df.show()
A simpler way is to do this in one line using function instr
:
df = spark_df.withColumn("my_texts", F.when(F.instr(spark_df["my_texts"], '_text1')>0, 'text_passed').otherwise("my_texts"))
df.show()
Upvotes: 1