Hadoop Data Ingestion

I have a below requirement:

There's an upstream system which makes a key-entry in a database table. That entry indicates that a set of data is available in a database-table (oracle). We have to ingest this data and save it as a parquet file. No processing of data is required. This ingestion process should start everytime a new key-entry is available.

For this problem statement, we have planned to have a database poller which polls for the key-entry. After reading that entry, we need to ingest the data from an Oracle table. For this ingestion purpose, which tool is best? Is it Kafka, Sqoop, Spark-SQL etc.,? Please help.

Also we need to ingest csv files too. Only when a file is completely written, then only we have to start ingesting it. Please let me know how to perform this as well.

Upvotes: 2