Reputation: 1316
Following is the HIVE query I am using, I am also using a Ranking function. I am running this on my local machine.
SELECT numeric_id, location, Rank(location), followers_count
FROM (
SELECT numeric_id, location, followers_count
FROM twitter_data
DISTRIBUTE BY numeric_id, location
SORT BY numeric_id, location, followers_count desc
) a
WHERE Rank(location)<10;
My Rank function is as follows:
package org.apache.hadoop.hive.contrib.udaf.ex;
import org.apache.hadoop.hive.ql.exec.UDF;
public final class Rank extends UDF{
private int counter;
private String last_key;
public int evaluate(final String key){
if ( !key.equalsIgnoreCase(this.last_key) ) {
this.counter = 0;
this.last_key = key;
}
return this.counter++;
}
}
I am creating the Jar of the above file and then doing the following steps before running the hive query. I tried doing it with runnable jar and creating with a simple as well.
ADD JAR /home/adminpc/Downloads/Project_input/Rank.jar;
CREATE TEMPORARY FUNCTION Rank AS 'org.apache.hadoop.hive.contrib.udaf.ex.Rank';
This is what I get to after executing the Hive Query--
hive> SELECT numeric_id, location, Rank(location), followers_count
> FROM (
> SELECT numeric_id, location, followers_count
> FROM twitter_data
> DISTRIBUTE BY numeric_id, location
> SORT BY numeric_id, location, followers_count desc
> ) a
> WHERE Rank(location)<1;
FAILED: NullPointerException null
Upvotes: 2
Views: 30252
Reputation: 63162
Your UDF does not appear to protect against null values in the input table .Specifically: examine what happens when location were null.
Upvotes: 3