Andonaeus
Andonaeus

Reputation: 872

Aggregating Set Type in Cassandra

I am trying to aggregate rows which each contain a column of Set. I would like the result to contain the sum of all sets, where null would be equivalent to the empty set. I expected a query like: "select sum(my_set_column) from my_table group by my_key_column" to do this, but the set type is not supported by this aggregate. Does anyone know of a way to aggregate this using existing cassandra built-ins? Thanks!

Upvotes: 0

Views: 168

Answers (2)

Andonaeus
Andonaeus

Reputation: 872

I was able to make this work with the following UDF/UDA. This could be considered a general purpose set aggregator if anyone else needs it in the future:

CREATE OR REPLACE FUNCTION agg_set_func(state tuple<int, set<bigint>>, val set<bigint>) CALLED ON NULL INPUT RETURNS tuple<int, set<bigint>> LANGUAGE java AS 
$$
    if (val == null) {
        return state;
    }
    Set<Long> s = state.getSet(1, Long.class);
    s.addAll(val);
    state.setSet(1, s);
    return state;
$$;

CREATE OR REPLACE FUNCTION agg_set_func_final(state tuple<int, set<bigint>>) CALLED ON NULL INPUT RETURNS set<bigint> LANGUAGE java AS 
$$
    return state.getSet(1, Long.class);
$$;

CREATE AGGREGATE agg_set(set<bigint>) 
SFUNC agg_set_func
STYPE tuple<int, set<bigint>>
FINALFUNC agg_set_func_final
INITCOND (0,{});

Upvotes: 0

Chris Lohfink
Chris Lohfink

Reputation: 16400

You have to write a udf (sum a single set) or a uda (sum multiple sets) but it should be pretty straight forward. Something like

CREATE FUNCTION set_sum(values set<int>)
CALLED ON NULL INPUT
RETURNS int
LANGUAGE java
AS $$
  if (values == null)
    return 0;

  int total = 0;
  for (int v : values)
    total += v;
  return total;
$$;

Then SELECT set_sum(value_column) FROM my_table WHERE id = 'key'; would aggregate the values of the set column in each row. If you want to merge multiple rows (ie with group by) you need to use a UDA which can use above function with an int state to replace total instead of starting from 0 each time.

Upvotes: 2

Related Questions