user3521180
user3521180

Reputation: 1130

CLI argument with spark-submit while executing python file

I am trying to convert the sql server tables to .csv format through below code in pyspark.

from pyspark import SparkContext
sc = SparkContext("local", "Simple App")
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)

    df = sqlContext.read.format("jdbc").option("url","jdbc:sqlserver://server:port").option("databaseName","database").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable","table").option("user","uid").option("password","pwd").load()

    df.registerTempTable("test")
    df.write.format("com.databricks.spark.csv").save("full_path")

so, if I want to convert multiple tables, I need to write multiple Data Frames.So, to avoid it, I wanted to take command line argument for the database name, and the table name from the users while iterating on the data frame through for loop.

Is it even possible? If yes, can somebody guide me on how to do it through spark-submit?

Upvotes: 0

Views: 3033

Answers (1)

Rahul Gupta
Rahul Gupta

Reputation: 756

Just do this change to spark-submit command and your code:

test.py

import sys
from pyspark import SparkContext
sc = SparkContext("local", "Simple App")
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
db_name = sys.argv[1]
table_name = sys.argv[2]
file_name = sys.argv[3]
df = sqlContext.read.format("jdbc").option("url","jdbc:sqlserver://server:port").option("databaseName",db_name).option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable",table_name).option("user","uid").option("password","pwd").load()

df.registerTempTable("test")
df.write.format("com.databricks.spark.csv").save(file_name)

Spark-submit command: spart-submit test.py <db_name> <table_name> <file_name>

Upvotes: 5

Related Questions