Reputation: 872
I am trying to aggregate rows which each contain a column of Set. I would like the result to contain the sum of all sets, where null would be equivalent to the empty set. I expected a query like: "select sum(my_set_column) from my_table group by my_key_column" to do this, but the set type is not supported by this aggregate. Does anyone know of a way to aggregate this using existing cassandra built-ins? Thanks!
Upvotes: 0
Views: 168
Reputation: 872
I was able to make this work with the following UDF/UDA. This could be considered a general purpose set aggregator if anyone else needs it in the future:
CREATE OR REPLACE FUNCTION agg_set_func(state tuple<int, set<bigint>>, val set<bigint>) CALLED ON NULL INPUT RETURNS tuple<int, set<bigint>> LANGUAGE java AS
$$
if (val == null) {
return state;
}
Set<Long> s = state.getSet(1, Long.class);
s.addAll(val);
state.setSet(1, s);
return state;
$$;
CREATE OR REPLACE FUNCTION agg_set_func_final(state tuple<int, set<bigint>>) CALLED ON NULL INPUT RETURNS set<bigint> LANGUAGE java AS
$$
return state.getSet(1, Long.class);
$$;
CREATE AGGREGATE agg_set(set<bigint>)
SFUNC agg_set_func
STYPE tuple<int, set<bigint>>
FINALFUNC agg_set_func_final
INITCOND (0,{});
Upvotes: 0
Reputation: 16400
You have to write a udf (sum a single set) or a uda (sum multiple sets) but it should be pretty straight forward. Something like
CREATE FUNCTION set_sum(values set<int>)
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java
AS $$
if (values == null)
return 0;
int total = 0;
for (int v : values)
total += v;
return total;
$$;
Then SELECT set_sum(value_column) FROM my_table WHERE id = 'key';
would aggregate the values of the set column in each row. If you want to merge multiple rows (ie with group by) you need to use a UDA which can use above function with an int state to replace total
instead of starting from 0 each time.
Upvotes: 2