DSEB
DSEB

Reputation: 67

Execute Variable Generated by Python Function in Pyspark

I have developped a generic python function that generates a variable that I want to execute in spark in order to get the needed Dataframe, details below (let's say i'm using pyspark shell directly):

#This is pyspark shell in cloudera platform

#Python Function
def generic_func(PARAMETERS):
 #Some operations
 return String_VARIABLE_To_Be_Executed

#Calling the function
df = generic_func(PARAMETERS)
exec(df)

But it seems that spark is still reading it as string variable, for the fact that when I execute the below code I get an error:

df.show()

I get the error below:

AttributeError: 'str' object has no attribute 'show'

Just to give you somme context, by launching:

df

The variable generates something like:

"accountDF.alias('L1').join(account.alias('L2'), f.col('L1.MEMBERNAME') == f.col('L2.PARENT_NAME'), how='left')"

The output of the variable is more complexe than that, this is just to tell you that the variable has some spark functions that needs to be executed.

type(df)
<type 'str'>

Our perspective is to execute this variable like if we are executing any pyspark dataframe function. In other word, we would like to turn this string variable into an axecutable pyspark variable.

Can you please help!

Upvotes: 0

Views: 2161

Answers (1)

El Mehdi OUAFIQ
El Mehdi OUAFIQ

Reputation: 322

Two options can be used either exec(df) or eval(df) to get the output result/dataframe, as shown below:

df = generic_func(PARAMETERS)
result = eval(df)
result.show()

Upvotes: 3

Related Questions