HiveQL query returns no results and no errors

Question

I am running Apache Hadoop 2.6.0 on Ubuntu 14.0 and I have a table created in Hive 0.13.0 as:

CREATE TABLE IF NOT EXISTS recipes_hive.cuisine (
ID INT COMMENT 'Cuisine ID.', 
name STRING COMMENT 'Cusine name - primary key.', 
area STRING COMMENT 'Name of the area of origin - foreign key.', 
scope STRING COMMENT 'Either country or area.') 
COMMENT 'Table containing cuisines data.'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE;

And I fill it with data with statement:

LOAD DATA LOCAL INPATH 'path_to_file/CUISINE.csv'
OVERWRITE INTO TABLE recipes_hive.cuisine;

My database has several such tables all created and filled with same procedure. When running simple queries like:

SELECT * FROM cuisine

or even with certain conditions in WHERE clause I get the results as expected but running more complex queries I get squat. For example:

SELECT cuisine.name, SUM(IF (ingredient.category = "fruit",1,2))/count(*) AS      PERC 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
GROUP BY cuisine.name 
ORDER BY PERC DESC

, or:

SELECT ingredient.id, ingredient.name 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE ingredient.id IN (
SELECT ingredient.id 
FROM cuisine c JOIN recipe ON recipe.cuisine = c.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE c.name = "Pakistan") AND cuisine.name = "Bangladesh"

The first example calculates some percentage and the second one checks for mutual elements.

The MapReduce and Hadoop get invoked properly and they return no error. The output ends with:

Execution completed successfully
MapredLocal task succeeded
OK
Time taken: 122.119 seconds

I have checked the web and people had similar issues as I do. I checked:

Hive Table returning empty result set on all queries

Simple Hive query is empty

but failed to get my issue resolved. The data is in fact in the HDFS and as mentioned before it works on simple queries.

So either there is something wrong with my Hive instance or my queries are not written correctly.

Any help would be greatly appreciated. Best regards.

ForeverLearner · Accepted Answer

If we have Cuisine table containing ID = {1,2,3} and Recipe table containing ID = {5,6,7} , then even though these tables are non-empty, we still get no rows returned when we do INNER JOIN Cuisine.ID = Recipe.ID (as the ID's are different in the 2 tables) Can you please check that there is no such condition.

SELECT count(1)
FROM cuisine c JOIN recipe ON recipe.cuisine = c.name WHERE c.name = "Pakistan";

--- must return > 0 

select count(1) from recipe as recipe
JOIN part_of ON part_of.id_recipe = recipe.id ;

--- must return > 0 

select count(1) from part_of as part_of
JOIN ingredient ON ingredient.name = part_of.ingredient ;

--- must return > 0

So the inner query returns a row when all the count(*) are non-zero. Now test the outer select :

SELECT ingredient.id, ingredient.name 
FROM cuisine JOIN recipe ON recipe.cuisine = cuisine.name JOIN part_of ON part_of.id_recipe = recipe.id JOIN ingredient ON ingredient.name = part_of.ingredient 
WHERE ingredient.id =  and cuisine.name = "Bangladesh";

HiveQL query returns no results and no errors

Answers (2)

Related Questions