Reputation: 325
I am trying to read a table of a database using PySpark and SQLAlchamy as follows:
SUBMIT_ARGS = "--jars mysql-connector-java-5.1.45-bin.jar pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
sc = SparkContext('local[*]', 'testSparkContext')
sqlContext = SQLContext(sc)
t0 = time.time()
database_uri = 'jdbc:mysql://{}:3306/{}'.format("127.0.0.1",<db_name>)
dataframe_mysql = sqlContext.read.format("jdbc").options(url=database_uri, driver = "com.mysql.jdbc.Driver", dbtable = <tablename>, user= <user>, password=<password>).load()
print(dataframe_mysql.rdd.map(lambda row :list(row)).collect())
t1 = time.time()
database_uri2 = 'mysql://{}:{}@{}/{}'.format(<user>,<password>,"127.0.0.1",<db_name>)
engine = create_engine(database_uri2)
connection = engine.connect()
s = text("select * from {}.{}".format(<db_name>,<table_name>))
result = connection.execute(s)
for each in result:
print(each)
t2= time.time()
print("Time taken by PySpark:", (t1-t0))
print("Time taken by SQLAlchamy", (t2-t1))
This is the time taken to fetch some 3100 rows:
Time taken by PySpark: 12.326745986938477
Time taken by SQLAlchamy: 0.21664714813232422
Why SQLAlchamy is outperforming PySpark? Is there any way to make this faster? Is there any error in my approach?
Upvotes: 3
Views: 3771
Reputation: 61
Why SQLAlchamy is outperforming PySpark? Is there any way to make this faster? Is there any error in my approach?
More than one. Ultimately you try use Spark in a way it is not intended to be used, measure incorrect thing and introduce incredible amount of indirection. Overall:
DataSource
is inefficient, and as you use it is completely sequential. Check parallellizing reads in Spark Gotchas.local
mode (designed for prototyping and unit testing) is just a cherry on the top.So at the end of the day your code is slow but it is not something you'd use in production application. SQLAlchemy and Spark are designed for complete different purposes - if you're looking for low latency database access layer Spark is not the right choice.
Upvotes: 6