rvd
rvd

Reputation: 337

How Hive sums across string column?

I am using apache-hive-1.2.1 and created a table :
test_table : key -> integer and name -> string

key     name
--------------
1       name1
2       name2
3       3

I am surprised how following queries are giving the corresponding outputs, since
the column is of type string :

select sum(name) from test_table;
Output : 3.0

and

select avg(name) from test_table;
Output : 3.0

Does hive perform aggregations even if the column type is string? How hive does this?

Upvotes: 1

Views: 4206

Answers (2)

Abhis
Abhis

Reputation: 605

An aggregate function that returns the sum of a set of numbers. Its single argument can be numeric column, or the numeric result of a function or expression applied to the column value. Rows with a NULL value for the specified column are ignored. If the table is empty, or all the values supplied to MIN are NULL, SUM returns NULL.

When the query contains a GROUP BY clause, returns one value for each combination of grouping values.

Return type: BIGINT for integer arguments, DOUBLE for floating-point arguments

Upvotes: 0

vijay kumar
vijay kumar

Reputation: 2049

please refer the code of hive builtin UDAF-sum() , which takes the signature below accepts "Only numeric or string type arguments(line-71)

name = "sum", value = "_FUNC_(x) - Returns the sum of a set of numbers"

Inside the GenericUDAFEvaluator() you can find case STRING: return new GenericUDAFSumDouble(); at line 66 and 67 , which means depending on the primitive type/data type of argument passed, respective aggregation is being done. i.e, for name1,name2(strings) corresponding values for aggregations are from new DoubleWritable(0);
So => 0.0+0.0+3 = 3.0

Upvotes: 4

Related Questions