sola.carol
sola.carol

Reputation: 47

Questions about Hive

I have this environment:

My goal is:

What did I do:

My doubts:

  1. I can not make selects direct in HDFS using Hive?
  2. Do I have to load the data into Hive and make the queries?
  3. If new data is entered into the mysql database, what is the best way to get this data and insert it into HDFS and then insert it into Hive again? (Maybe in real time)

Thank you in advance

Upvotes: 0

Views: 252

Answers (2)

Dev
Dev

Reputation: 13773

I can not make selects direct in HDFS using Hive?

You can. Create External Table in hive specifying your hdfs location. Then you can perform any HQL over it.

Do I have to load the data into Hive and make the queries?

In case of external table, you don't need to load data in hive; your data resides in the same HDFS directory.

If new data is entered into the mysql database, what is the best way to get this data.

You can use Sqoop Incremental Import for this. It will fetch only newly added/updated data (depending upon incremental mode). You can create a sqoop job and schedule it as per your need.

Upvotes: 3

miko
miko

Reputation: 101

You can try Impala which is much faster than Hive in case of SQL queries. You need to define tables most probably specifying some delimiter, storage format and where the data is stored on HDFS (I don't know what kind of data are you storing). Then you can write SQL queries which will take the data from HDFS.

I have no experience with real-time data ingestion from relational databases, however you can try scheduling Sqoop jobs with cron.

Upvotes: 1

Related Questions