jazmynn
jazmynn

Reputation: 11

Hadoop data extraction

I am trying to create a process that hits Hadoop and extracts data to my local windows machine. I successfully created on ODBC and was able to test the connection. Researching further I found that I needed to use Microsoft Hive odbc, and I have not been able to get a successful test on the connection. I am open to using different tools, but would like some input on the best way to accomplish what I am trying to do. The data that I am looking for also exists on an ftp server and has been loaded to Hadoop, I could get it from the ftp server but would rather pull it from Hadoop. I am brand new to Hadoop and I have researched and read, but have not been able to find a solution. I know the solution is there, I am just not looking in the right place, could someone please point me in the right direction?

Upvotes: -1

Views: 1190

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191701

hits Hadoop and extracts data to my local windows machine

First suggestion: Apache Spark

I successfully created on ODBC and was able to test the connection

Hadoop does not provide ODBC... Hive does

Researching further I found that I needed to use Microsoft Hive odbc

Is your data in Azure? That's the only reason you'd be using a Microsoft driver, as far as I can tell

would like some input on the best way to accomplish what I am trying to do

That much is unclear... You've mentioned SQL tools so far, which isn't accessible over ODBC...

If you are storing data in Hive, JDBC/ODBC will work fine, but Spark would be quicker if you decide to run it on a YARN cluster that would be within Hadoop.

I could get it from the ftp server but would rather pull it from Hadoop

Personally, I would not recommend you get it from Hadoop

  1. Hadoop, (more accurately, HDFS) is not a replacement for FTP
  2. If you files are "small enough" to be stored fine within FTP, there is little reason to extract them to HDFS because HDFS is optimized to handle rather large files.
  3. You are brand new to hadoop, and you've suggested you can easily pull FTP files.

Second suggestion: If you are dead-set on using a tool within the Hadoop ecosystem, but not explicitly HDFS, try Apache Nifi project which provides a GetFTP processor.

Upvotes: 1

Related Questions