Reputation: 1691
I saw some related topics here , but still it isn't clear to me, How group by the latest row values with cassandra 4.0.1
Let's say my table looks like;
CREATE TABLE simple_search (
engine text,
term text,
time bigint,
rank bigint,
url text,
domain text,
pagenum bigint,
descr text,
display_url text,
title text,
type text,
PRIMARY KEY ((domain), term , time , engine, url , pagenum)
) WITH CLUSTERING ORDER BY (term DESC, time DESC, engine DESC , url DESC);
My data looks like:
SELECT time, rank, term from search_by_domain_termsV2 where domain ='zerotoappstore.com'
time , rank, term
1633297772, 105, avfoundation swift
1633315263, 112, best ide
1633332881, 119, best ide
1633365856, 50, developing an app cost
1633375273, 36, developing an app cost
I want to have after group by
time , rank, term
1633297772, 105, avfoundation swift
1633332881, 119, best ide
1633375273, 36, developing an app cost
If I do
SELECT max(time) , rank, term from search_by_domain_termsV2 where domain ='zerotoappstore.com' GROUP BY term;
it gives me the correct max time value but not correct rating,and term.
1633297772 105 avfoundation swift
1633332881 112 best ide
1633375273 50 developing an app cost
Is it possible to group by term and take the max value of time ?
Upvotes: 2
Views: 169
Reputation: 649
@VitalyT,
First, if we're not specifying the pagenum
as part of the clustering order by clause of the create table construct, it would give you an error as follows:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"
so, it has to be like as follows:
CREATE TABLE IF NOT EXISTS simple_search(
...
PRIMARY KEY (domain, term, time, engine, url, pagenum)
) WITH CLUSTERING ORDER BY (term DESC, time DESC, engine DESC, url [ASC|DESC]);
Next, with the give data sample of 5 rows. Note, I assumed some values for engine
, url
, pagenum
columns as those values weren't provided in the original question:
SELECT * FROM simple_search ;
domain | term | time | engine | url | pagenum | descr | display_url | rank | title | type
--------------------+------------------------+------------+---------+------+---------+-------+-------------+------+-------+------
zerotoappstore.com | developing an app cost | 1633375273 | engine5 | url5 | 5 | null | null | 36 | null | null
zerotoappstore.com | developing an app cost | 1633365856 | engine4 | url4 | 4 | null | null | 50 | null | null
zerotoappstore.com | best ide | 1633332881 | engine3 | url3 | 3 | null | null | 119 | null | null
zerotoappstore.com | best ide | 1633315263 | engine2 | url2 | 2 | null | null | 112 | null | null
zerotoappstore.com | avfoundation swift | 1633297772 | engine1 | url1 | 1 | null | null | 105 | null | null
(5 rows)
we would get the following result if we only retrieve the MAX(time)
column (without any GROUP BY
):
SELECT MAX(time),rank,term FROM simple_search WHERE domain = 'zerotoappstore.com';
system.max(time) | rank | term
------------------+------+------------------------
1633375273 | 36 | developing an app cost
(1 rows)
Now, let's see what happens if we include the GROUP BY term
clause to the same exact SELECT
statement:
SELECT MAX(time), rank, term FROM simple_search WHERE domain = 'zerotoappstore.com' GROUP BY term;
system.max(time) | rank | term
------------------+------+------------------------
1633375273 | 36 | developing an app cost
1633332881 | 119 | best ide
1633297772 | 105 | avfoundation swift
(3 rows)
What if we remove the MAX
aggregate function on time
column because we've the data already stored in the descending order for time
column? We get the following:
SELECT time,rank,term FROM simple_search WHERE domain = 'zerotoappstore.com' GROUP BY term;
time | rank | term
------------+------+------------------------
1633375273 | 36 | developing an app cost
1633332881 | 119 | best ide
1633297772 | 105 | avfoundation swift
(3 rows)
Is this what you want as your result? Please also see the corresponding documentation for certain conditions as it is laid out.
Upvotes: 2