Reputation: 337
I am using apache-hive-1.2.1
and created a table :
test_table : key -> integer and name -> string
key name
--------------
1 name1
2 name2
3 3
I am surprised how following queries are giving the corresponding outputs, since
the column is of type string :
select sum(name) from test_table;
Output : 3.0
and
select avg(name) from test_table;
Output : 3.0
Does hive perform aggregations even if the column type is string? How hive does this?
Upvotes: 1
Views: 4206
Reputation: 605
An aggregate function that returns the sum of a set of numbers. Its single argument can be numeric column, or the numeric result of a function or expression applied to the column value. Rows with a NULL value for the specified column are ignored. If the table is empty, or all the values supplied to MIN are NULL, SUM returns NULL.
When the query contains a GROUP BY clause, returns one value for each combination of grouping values.
Return type: BIGINT for integer arguments, DOUBLE for floating-point arguments
Upvotes: 0
Reputation: 2049
please refer the code of hive builtin UDAF-sum() , which takes the signature below accepts "Only numeric or string type arguments(line-71)
name = "sum", value = "_FUNC_(x) - Returns the sum of a set of numbers"
Inside the GenericUDAFEvaluator() you can find case STRING: return new GenericUDAFSumDouble(); at line 66 and 67 , which means depending on the primitive type/data type of argument passed, respective aggregation is being done. i.e, for name1,name2(strings) corresponding values for aggregations are from new DoubleWritable(0);
So => 0.0+0.0+3 = 3.0
Upvotes: 4