Pankaj Kaundal
Pankaj Kaundal

Reputation: 1022

What is wrong with this function on pyspark?

I have a data frame like this below with just one column and one row, I want to create a function that replaces the matched string with some text.

df2 = pd.DataFrame([['_text1']],columns = ['my_texts'])
spark_df = spark.createDataFrame(df2)

def text_func(df, col):
    return df.withColumn("origin_code", sparkSqlFunctions.when("my_texts".startswith('_text1') == True, 'text_passed')
                         .otherwise("my_texts"))

this function somehow doesn't work and gives me an error "Data Frame object has no attribute 'text_func' ".

i am calling it like this final = spark_df.withColumn("my_texts", text_func(spark_df, "my_texts")) it's probably wrong.

Can anyone help me with this?

Upvotes: 1

Views: 227

Answers (2)

Ala Tarighati
Ala Tarighati

Reputation: 3817

In your function, change

"my_texts".startswith('_text1')

to

sparkSqlFunctions.col("my_texts").startswith('_text1') 

This might be the reason for confusion. It works for me.

By this modification I explicitly tell spark that the first "my_texts" is a column, not a string.

You also can remove ==True from the code.

Upvotes: 1

niuer
niuer

Reputation: 1669

You can try this:

from pyspark.sql import SparkSession, SQLContext, Column
from pyspark.sql import functions as F
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
import pandas as pd

spark = SparkSession.builder.appName('test').getOrCreate()
df2 = pd.DataFrame([['_text1']],columns = ['my_texts'])
spark_df = spark.createDataFrame(df2)
spark_df.show()

text_func = udf(lambda my_texts: "text_passed" if my_texts.startswith('_text1') == True else my_texts, StringType())
df = spark_df.withColumn('my_texts', text_func(spark_df['my_texts']))
df.show()

A simpler way is to do this in one line using function instr:

df = spark_df.withColumn("my_texts", F.when(F.instr(spark_df["my_texts"], '_text1')>0, 'text_passed').otherwise("my_texts"))
df.show()

Upvotes: 1

Related Questions