VitalyT
VitalyT

Reputation: 1691

Cassandra - How group by latest timestamp

I saw some related topics here , but still it isn't clear to me, How group by the latest row values with cassandra 4.0.1

Let's say my table looks like;

CREATE TABLE simple_search (
    engine text,
    term text,
    time bigint,
    rank bigint,
    url text,
    domain text,
    pagenum bigint,
    descr text,
    display_url text,
    title text,
    type text,
    PRIMARY KEY ((domain), term , time , engine, url , pagenum)
) WITH CLUSTERING ORDER BY (term DESC, time DESC,  engine DESC , url DESC);

My data looks like:

SELECT time, rank, term  from search_by_domain_termsV2 where domain ='zerotoappstore.com' 



time ,    rank, term 
1633297772, 105,  avfoundation swift
1633315263, 112,  best ide
1633332881, 119,  best ide
1633365856, 50,   developing an app cost
1633375273, 36,   developing an app cost

I want to have after group by

time ,    rank, term 
1633297772, 105,  avfoundation swift
1633332881, 119,  best ide
1633375273, 36,   developing an app cost

If I do

SELECT max(time) , rank, term  from search_by_domain_termsV2 where domain ='zerotoappstore.com'  GROUP BY term;

it gives me the correct max time value but not correct rating,and term.

1633297772  105 avfoundation swift
1633332881  112 best ide
1633375273  50  developing an app cost

Is it possible to group by term and take the max value of time ?

Upvotes: 2

Views: 169

Answers (1)

Madhavan
Madhavan

Reputation: 649

@VitalyT,

First, if we're not specifying the pagenum as part of the clustering order by clause of the create table construct, it would give you an error as follows:

InvalidRequest: Error from server: code=2200 [Invalid query] message="Clustering key columns must exactly match columns in CLUSTERING ORDER BY directive"

so, it has to be like as follows:

CREATE TABLE IF NOT EXISTS simple_search(
...
PRIMARY KEY (domain, term, time, engine, url, pagenum)
) WITH CLUSTERING ORDER BY (term DESC, time DESC, engine DESC, url [ASC|DESC]);

Next, with the give data sample of 5 rows. Note, I assumed some values for engine, url, pagenum columns as those values weren't provided in the original question:

SELECT * FROM simple_search ;
 domain             | term                   | time       | engine  | url  | pagenum | descr | display_url | rank | title | type
--------------------+------------------------+------------+---------+------+---------+-------+-------------+------+-------+------
 zerotoappstore.com | developing an app cost | 1633375273 | engine5 | url5 |       5 |  null |        null |   36 |  null | null
 zerotoappstore.com | developing an app cost | 1633365856 | engine4 | url4 |       4 |  null |        null |   50 |  null | null
 zerotoappstore.com |               best ide | 1633332881 | engine3 | url3 |       3 |  null |        null |  119 |  null | null
 zerotoappstore.com |               best ide | 1633315263 | engine2 | url2 |       2 |  null |        null |  112 |  null | null
 zerotoappstore.com |     avfoundation swift | 1633297772 | engine1 | url1 |       1 |  null |        null |  105 |  null | null

(5 rows)

we would get the following result if we only retrieve the MAX(time) column (without any GROUP BY):

SELECT MAX(time),rank,term FROM simple_search WHERE domain = 'zerotoappstore.com';

 system.max(time) | rank | term
------------------+------+------------------------
       1633375273 |   36 | developing an app cost

(1 rows)

Now, let's see what happens if we include the GROUP BY term clause to the same exact SELECT statement:

SELECT MAX(time), rank, term FROM simple_search WHERE domain = 'zerotoappstore.com' GROUP BY term;
 system.max(time) | rank | term
------------------+------+------------------------
       1633375273 |   36 | developing an app cost
       1633332881 |  119 |               best ide
       1633297772 |  105 |     avfoundation swift

(3 rows)

What if we remove the MAX aggregate function on time column because we've the data already stored in the descending order for time column? We get the following:

SELECT time,rank,term FROM simple_search WHERE domain = 'zerotoappstore.com' GROUP BY term;

 time       | rank | term
------------+------+------------------------
 1633375273 |   36 | developing an app cost
 1633332881 |  119 |               best ide
 1633297772 |  105 |     avfoundation swift

(3 rows)

Is this what you want as your result? Please also see the corresponding documentation for certain conditions as it is laid out.

Upvotes: 2

Related Questions