Reputation: 371
I am using a Spark UDF to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.
My GET endpoint takes 2 query parameters,param1 and param2. So initially, I have a dataframe paramDF that has two columns param1 and param2.
param1 param2
12 25
45 95
Schema: paramDF:pyspark.sql.dataframe.DataFrame
param1:string
param2:string
Now I write a UDF that accept the two parameters, register it, and then invoke this UDF for each row in the dataframe. UDF is as below:
def executeRestApi(param1,param2):
dlist=[]
try:
print(DataUrl.format(token=TOKEN, q1=param1,q2=param2))
response=requests.get(DataUrl.format(token=TOKEN, oid=param1,wid=param2))
if(response.status_code==200):
metrics=response.json()['data']['metrics']
dic={}
dic['metric1'] = metrics['metric1']
dic['metric2'] = metrics['metric2']
dlist.append(dic)
pandas.DataFrame(dlist).to_csv("../../dbfs/mnt/raw/Important/MetricData/listofmetrics.csv",header=True,index=False,mode='x')
return "Success"
except Exception as e:
return "Failure"
Register the UDF:
udf_executeRestApi = udf(executeRestApi, StringType())
Finally the call the UDF this way
paramDf.withColumn("result",udf_executeRestApi(col("param1"),col("param2"))
I dont see any errors while calling the UDF, in fact the UDF returns the value "Success" correctly. Only thing is that the files are not written to Azure BLOB storage, no matter what I try. UDFs' are primarily meant for custom functionality(and return a value).However ,in my case, I am trying to execute the GET API call and the write operation using the UDF(and that is my main intention here).
There is no issue with my pandas.DataFrame().tocsv(),as the same line, when tried separately,with a simple list is writing data to the BLOB correctly.
What could be going wrong here?
Note: Env is Spark on Databricks. There isn't any problem with the indentation, even though it looks untidy here.
Upvotes: 0
Views: 180