BARATH
BARATH

Reputation: 372

Can Apache Spark be used in place of Sqoop

I have tried connecting spark with JDBC connections to fetch data from MySQL / Teradata or similar RDBMS and was able analyse the data.

Can spark be used to store the data to HDFS? Is there any possibility for spark outperforming the activities of Sqoop.

Looking for you valuable answers and explanations.

Upvotes: -1

Views: 889

Answers (1)

Thiago Baldim
Thiago Baldim

Reputation: 7742

There are two main things about Sqoop and Spark. The main difference is Sqoop will read the data from your RDMS doesn't matter what you have and you don't need to worry much about how you table is configured.

With Spark using JDBC connection is a little bit different how you need to load the data. If your database doesn't have any column like numeric ID or timestamp Spark will load ALL the data in one single partition. And then will try to process and save. If you have one column to use as partition than Spark sometimes can be even faster than Sqoop.

I would recommend you to take a look in this doc.enter link description here

The conclusion is, if you are going to do a simple export and that need to be done daily with no transformation I would recommend Sqoop to be simple to use and will not impact your database that much. Using Spark will work well IF your table is ready for that, besides that goes with Sqoop

Upvotes: 0

Related Questions