Reputation: 1227
I am trying to implement a UDF in Cassandra where I pass a parameter with the number of rows of the actual query. The UDF looks like:
CREATE OR REPLACE FUNCTION hashvalue(value text, size int)
RETURNS NULL ON NULL INPUT
RETURNS int
LANGUAGE java
AS
$$
return Math.abs(value.hashCode() % size);
$$;
The query is supposed to look like this:
SELECT name, hashvalues(name, (SELECT count(*) FROM test_table) AS hash) FROM test_table;
I am expecting something like:
name hash
text1 184
text2 932
text3 3
[...]
I get this error:
SyntaxException: line 1:25 no viable alternative at input 'SELECT' (SELECT hashvalues(name, [(]SELECT...)
My general idea is to map all of it in a Map and at the end I would like to transform it via UDA to a sketch (e.g. Bloomfilter).
Is this somehow possible or am I thinking completely wrong? Thanks
EDIT:
Storing all the sketch information in a UDT is a better approach. This is as far as I got...
CREATE TYPE bloomfilter_udt(
n_as_sample_size int,
m_as_number_of_buckets int,
p_as_next_prime_above_m bigint,
hash_for_string_coefficient_a list <bigint>,
hash_for_number_coefficients_a list <bigint>,
hash_for_number_coefficients_b list <bigint>,
bloom_filter_as_map map<int, int>
);
CREATE OR REPLACE FUNCTION bloomfilter_udf(state bloomfilter_udt, value text, sample_size int)
CALLED ON NULL INPUT
RETURNS bloomfilter_udt
LANGUAGE java
AS
$$
//do something
return state;
$$;
CREATE OR REPLACE AGGREGATE bloomfilter_uda(text, int)
SFUNC bloomfilter_udf
STYPE bloomfilter_udt
INITCOND {};
Upvotes: 1
Views: 156
Reputation: 57798
CQL does not allow subqueries.
You'll probably have to rethink the approach here. There's probably a much greater value in spending the engineering time to do this on the application side, rather than in Cassandra.
Upvotes: 1