AndreasInfo
AndreasInfo

Reputation: 1227

Implementing sketches in Cassandra using UDF and UDA

I am trying to implement a UDF in Cassandra where I pass a parameter with the number of rows of the actual query. The UDF looks like:

CREATE OR REPLACE FUNCTION hashvalue(value text, size int)
RETURNS NULL ON NULL INPUT
RETURNS int
LANGUAGE java
AS 
$$
return Math.abs(value.hashCode() % size);
$$;

The query is supposed to look like this:

SELECT name, hashvalues(name, (SELECT count(*) FROM test_table) AS hash) FROM test_table;

I am expecting something like:

name    hash
text1   184
text2   932
text3   3
[...]

I get this error:

SyntaxException: line 1:25 no viable alternative at input 'SELECT' (SELECT hashvalues(name, [(]SELECT...)

My general idea is to map all of it in a Map and at the end I would like to transform it via UDA to a sketch (e.g. Bloomfilter).

Is this somehow possible or am I thinking completely wrong? Thanks

EDIT:

Storing all the sketch information in a UDT is a better approach. This is as far as I got...

CREATE TYPE bloomfilter_udt(
  n_as_sample_size int,
  m_as_number_of_buckets int,
  p_as_next_prime_above_m bigint,
  hash_for_string_coefficient_a list <bigint>,
  hash_for_number_coefficients_a list <bigint>,
  hash_for_number_coefficients_b list <bigint>,
  bloom_filter_as_map map<int, int>
);

CREATE OR REPLACE FUNCTION bloomfilter_udf(state bloomfilter_udt, value text, sample_size int)
CALLED ON NULL INPUT
RETURNS bloomfilter_udt
LANGUAGE java
AS 
$$
//do something
return state;
$$;

CREATE OR REPLACE AGGREGATE bloomfilter_uda(text, int)
SFUNC bloomfilter_udf
STYPE bloomfilter_udt
INITCOND {};

Upvotes: 1

Views: 156

Answers (1)

Aaron
Aaron

Reputation: 57798

CQL does not allow subqueries.

You'll probably have to rethink the approach here. There's probably a much greater value in spending the engineering time to do this on the application side, rather than in Cassandra.

Upvotes: 1

Related Questions