Vijeth Devangi
Vijeth Devangi

Reputation: 45

Hadoop Data Ingestion

I have a below requirement:

There's an upstream system which makes a key-entry in a database table. That entry indicates that a set of data is available in a database-table (oracle). We have to ingest this data and save it as a parquet file. No processing of data is required. This ingestion process should start everytime a new key-entry is available.

For this problem statement, we have planned to have a database poller which polls for the key-entry. After reading that entry, we need to ingest the data from an Oracle table. For this ingestion purpose, which tool is best? Is it Kafka, Sqoop, Spark-SQL etc.,? Please help.

Also we need to ingest csv files too. Only when a file is completely written, then only we have to start ingesting it. Please let me know how to perform this as well.

Upvotes: 2

Views: 696

Answers (2)

Deno George
Deno George

Reputation: 362

For Ingesting relational Data you can use sqoop, and for you scenario you can have look at https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

write sqoop incremental job and schedule it using cron , each time sqoop job will execute you will have updated data in hdfs.

For .csv files you can use flume. refer, https://www.rittmanmead.com/blog/2014/05/trickle-feeding-webserver-log-files-to-hdfs-using-apache-flume/

Upvotes: 2

adranale
adranale

Reputation: 2874

With Sqoop you can import data from database in your Hadoop File System.

Upvotes: 0

Related Questions