differences between rdbms vs. spark sql

Question

I'm working with Apache-Spark and in my project, I want to use Spark-SQL. But, I have to be sure Spark-SQL's query performance. I know that Spark-SQL is not effective like RDBMS. But I want to learn that are there too much time gap between Spark-SQL and RDBMS queries?

For example, I'm working on Virtual Machine which has 4 gb ram and 1 core CPU. It is a slow system. I have a small data set with 2 tables. First one has 5M records, second one has 1K records. When I join two tables, query takes about 60 seconds. Is it normal for Spark-SQL with this hardware? If I do same join operation with RDBMS, it takes too less time but I can't test it with physical limits at office.

And a last question: How can I reduce query time in Spark-SQL?

differences between rdbms vs. spark sql

Answers (1)

Related Questions