trallallalloo
trallallalloo

Reputation: 602

differences between rdbms vs. spark sql

I'm working with Apache-Spark and in my project, I want to use Spark-SQL. But, I have to be sure Spark-SQL's query performance. I know that Spark-SQL is not effective like RDBMS. But I want to learn that are there too much time gap between Spark-SQL and RDBMS queries?

For example, I'm working on Virtual Machine which has 4 gb ram and 1 core CPU. It is a slow system. I have a small data set with 2 tables. First one has 5M records, second one has 1K records. When I join two tables, query takes about 60 seconds. Is it normal for Spark-SQL with this hardware? If I do same join operation with RDBMS, it takes too less time but I can't test it with physical limits at office.

And a last question: How can I reduce query time in Spark-SQL?

Upvotes: 1

Views: 1379

Answers (1)

Marcelo Manzo
Marcelo Manzo

Reputation: 11

I believe the problem is the virtual machine. I was on the same boat, and what ended up doing it was installing Spark on Windows (you can do that, just google it). The performance was much better (I have a 4 core laptop, 4gb ram and ssd drive).

Spark-SQL is really powerful, depending on your needs. What you are comparing with the performance will be amazing, but you need to do/implement things differently than what you were used to doing in a regular RDBMS.

Upvotes: 0

Related Questions