Implementing sketches in Cassandra using UDF and UDA

Question

I am trying to implement a UDF in Cassandra where I pass a parameter with the number of rows of the actual query. The UDF looks like:

CREATE OR REPLACE FUNCTION hashvalue(value text, size int)
RETURNS NULL ON NULL INPUT
RETURNS int
LANGUAGE java
AS 
$$
return Math.abs(value.hashCode() % size);
$$;

The query is supposed to look like this:

SELECT name, hashvalues(name, (SELECT count(*) FROM test_table) AS hash) FROM test_table;

I am expecting something like:

name    hash
text1   184
text2   932
text3   3
[...]

I get this error:

SyntaxException: line 1:25 no viable alternative at input 'SELECT' (SELECT hashvalues(name, [(]SELECT...)

My general idea is to map all of it in a Map and at the end I would like to transform it via UDA to a sketch (e.g. Bloomfilter).

Is this somehow possible or am I thinking completely wrong? Thanks

EDIT:

Storing all the sketch information in a UDT is a better approach. This is as far as I got...

CREATE TYPE bloomfilter_udt(
  n_as_sample_size int,
  m_as_number_of_buckets int,
  p_as_next_prime_above_m bigint,
  hash_for_string_coefficient_a list ,
  hash_for_number_coefficients_a list ,
  hash_for_number_coefficients_b list ,
  bloom_filter_as_map map
);

CREATE OR REPLACE FUNCTION bloomfilter_udf(state bloomfilter_udt, value text, sample_size int)
CALLED ON NULL INPUT
RETURNS bloomfilter_udt
LANGUAGE java
AS 
$$
//do something
return state;
$$;

CREATE OR REPLACE AGGREGATE bloomfilter_uda(text, int)
SFUNC bloomfilter_udf
STYPE bloomfilter_udt
INITCOND {};

Aaron · Accepted Answer

CQL does not allow subqueries.

You'll probably have to rethink the approach here. There's probably a much greater value in spending the engineering time to do this on the application side, rather than in Cassandra.

Implementing sketches in Cassandra using UDF and UDA

Answers (1)

Related Questions