baobobs
baobobs

Reputation: 703

Palantir Foundry spark.sql query

When I attempt to query my input table as a view, I get the error com.palantir.foundry.spark.api.errors.DatasetPathNotFoundException. My code is as follows:

def Median_Product_Revenue_Temp2(Merchant_Segments):

    Merchant_Segments.createOrReplaceTempView('Merchant_Segments_View')
    df = spark.sql('select * from Merchant_Segments_View limit 5')  
    return df

I need to dynamically query this table, since I am trying to calculate the median using percentile_approx across numerous fields, and I'm not sure how to do this without using spark.sql.

If I try to avoid using spark.sql to calculate median across numerous fields using something like the below code, it results in the error Missing Transform Attribute: A module object does not have an attribute percentile_approx. Please check the spelling and/or the datatype of the object.

import pyspark.sql.functions as F 

exprs = {x: percentile_approx("x", 0.5) for x in df.columns if x is not exclustion_list}
df = df.groupBy(['BANK_NAME','BUS_SEGMENT']).agg(exprs)

Upvotes: 4

Views: 1172

Answers (2)

user19423791
user19423791

Reputation: 11

try createGlobalTempView. It worked for me.

eg:

df.createGlobalTempView("people")

(Don't know the root cause why localTempView dose not work )

Upvotes: 1

baobobs
baobobs

Reputation: 703

I managed to avoid using dynamic sql for calculating median across columns using the following code:

df_result = df.groupBy(group_list).agg(
    *[ F.expr('percentile_approx(nullif('+col+',0), 0.5)').alias(col) for col in df.columns if col not in exclusion_list]
)

Embedding percentile_approx in an F.expr bypassed the issue I was encountering in the second half of my post.

Upvotes: 0

Related Questions