Mark Tozzi
Mark Tozzi

Reputation: 10903

Automated Testing in Apache Hive

I am about to embark on a project using Apache Hadoop/Hive which will involve a collection of hive query scripts to produce data feeds for various down stream applications. These scripts seem like ideal candidates for some unit testing - they represent the fulfillment of an API contract between my data store and client applications, and as such, it's trivial to write what the expected results should be for a given set of starting data. My issue is how to run these tests.

If I was working with SQL queries, I could use something like SQLlite or Derby to quickly bring up test databases, load test data and run a collection of query tests against them. Unfortunately, I am unaware of any such tools for Hive. At the moment, my best thought is to have the test framework bring up a hadoop local instance and run Hive against that, but I've never done that before and I'm not sure it will work, or be the right path.

Also, I'm not interested in a pedantic discussion about if what I am doing is unit testing or integration testing - I just need to be able to prove my code works.

Upvotes: 10

Views: 12742

Answers (4)

hadoopNerd
hadoopNerd

Reputation: 71

I know this is an old thread, but just in case someone comes across it. I have followed up on the whole minicluster & hive testing, and found that things have changed with MR2 and YARN, but in a good way. I have put together an article and github repo to give some help in it:

http://www.lopakalogic.com/articles/hadoop-articles/hive-testing/

Hope it helps!

Upvotes: 2

Julio Farah
Julio Farah

Reputation: 151

I'm working as part of a team to support a big data and analytics platform, and we also have this kind of issue.

We've been searching for a while and we found two pretty promising tools: https://github.com/klarna/HiveRunner https://github.com/bobfreitas/HadoopMiniCluster

HiveRunner is a framework built on top of JUnit to test Hive Queries. It starts a standalone HiveServer with in memory HSQL as the metastore. With it you can stub tables, views, mock samples, etc.

There are some limitations on Hive versions though, but I definitely recommend it

Hope it helps you =)

Upvotes: 4

btiernay
btiernay

Reputation: 8129

You may also want to consider the following blog post which describes automating unit testing using a custom utility class and ant: http://dev.bizo.com/2011/04/hive-unit-testing.html

Upvotes: 3

David Gruzman
David Gruzman

Reputation: 8088

Hive has special standalone mode, specifically design for the testing purposes. In this case it can run without hadoop. I think it is exactly what you need. There is a link to the documentation:

http://wiki.apache.org/hadoop/Hive/HiveServer

Upvotes: 4

Related Questions