pandas to_csv function not writing to Blob Storage when called from Spark UDF

Question

I am using a Spark UDF to read some data from a GET endpoint and write them as a CSV file to a Azure BLOB location.

My GET endpoint takes 2 query parameters,param1 and param2. So initially, I have a dataframe paramDF that has two columns param1 and param2.

param1   param2
12        25
45        95

Schema:    paramDF:pyspark.sql.dataframe.DataFrame
           param1:string
           param2:string

Now I write a UDF that accept the two parameters, register it, and then invoke this UDF for each row in the dataframe. UDF is as below:

    def executeRestApi(param1,param2):
      dlist=[]
      try:
        print(DataUrl.format(token=TOKEN, q1=param1,q2=param2))
        response=requests.get(DataUrl.format(token=TOKEN, oid=param1,wid=param2))
        if(response.status_code==200):
          metrics=response.json()['data']['metrics']
          dic={}
          dic['metric1'] = metrics['metric1']
          dic['metric2'] = metrics['metric2']
          dlist.append(dic)
        
    pandas.DataFrame(dlist).to_csv("../../dbfs/mnt/raw/Important/MetricData/listofmetrics.csv",header=True,index=False,mode='x')
    return "Success"
          
   except Exception as e:
        return "Failure"

Register the UDF:

udf_executeRestApi = udf(executeRestApi, StringType())

Finally the call the UDF this way

paramDf.withColumn("result",udf_executeRestApi(col("param1"),col("param2"))

I dont see any errors while calling the UDF, in fact the UDF returns the value "Success" correctly. Only thing is that the files are not written to Azure BLOB storage, no matter what I try. UDFs' are primarily meant for custom functionality(and return a value).However ,in my case, I am trying to execute the GET API call and the write operation using the UDF(and that is my main intention here).

There is no issue with my pandas.DataFrame().tocsv(),as the same line, when tried separately,with a simple list is writing data to the BLOB correctly.

What could be going wrong here?

Note: Env is Spark on Databricks. There isn't any problem with the indentation, even though it looks untidy here.

pandas to_csv function not writing to Blob Storage when called from Spark UDF

Answers (1)

Related Questions