Set path file as parameter didnt worked in python pyspark

Question

I want to run a code where it can ingest the data using jdbc driver and save it into a file path. It successfully ingested the data but the saving function didnt worked. I know that we can use code like this to save data:

a.write.mode("overwrite").parquet("test/partition_test.parquet")

Is there any way I can set the file path as a parameter? I've tried set the parameter like below but it didnt worked.

my code:

def ingest(spark, db_url, tablename, username, password,destination, driver, save_format="parquet"):
    a = spark.read.format("jdbc").option("url",db_url).option("dbtable",tablename).option("user", username).option("password",password).option("path", destination).option("driver",driver).load()   
    return a


ingest(spark, "jdbc:mysql://192.168.122.1:3306/users", "users", "root", "123456@h21","/path", "com.mysql.jdbc.Driver", save_format="parquet")

Alex Ott · Accepted Answer

You're mixing two things together in your code. What you need to do is done in the 2 steps:

reading the data into dataframe
writing dataframe into a file

so the code needs to be something like this:

def ingest(spark, db_url, tablename, username, password, destination, 
    driver, save_format="parquet"):
    a = spark.read.format("jdbc").option("url",db_url)\
       .option("dbtable",tablename).option("user", username)\
       .option("password",password).option("driver",driver).load()
    a.write.format(save_format).save(destination)
    return a

this function will return dataframe, but if you just need to read & write data, then you can return None instead of dataframe.

Set path file as parameter didnt worked in python pyspark

Answers (1)

Related Questions