igx
igx

Reputation: 4231

MATERIALIZED VIEW VS query with ALLOW FILTERING

I have the following table users((username), last_seen) where last_seen is basically the writetime . Number of records in the table is aprox 10 million. Insert is pretty much straightforward insert into users (username, last_seen) VALUES ([username], now)

BUT I need to query by the last_seen column (this query runs every minute) e.g :

select username from users where last_seen < (now - 1 day) 

I have two options as I see it:

  1. use materialized view :

    CREATE MATERIALIZED VIEW users_last_seen AS
    SELECT last_seen, username
    FROM users
    WHERE last_seen IS NOT NULL
    PRIMARY KEY (last_seen, username);
    

    and simply query

    select username from users_last_seen where last_seen < (now - 1 day)
    
  2. query the users table

    select username from users where last_seen < (now - 1 day) ALLOW FILTERING
    

which one is more efficient? AFAIK materialized view is unstable and have impact on performance.

Upvotes: 0

Views: 1026

Answers (2)

igx
igx

Reputation: 4231

I found out that using SASI Index is the best option in this situation

Upvotes: 0

chaitan64arun
chaitan64arun

Reputation: 783

AFAIK materialized view is unstable

At this point of time, I think so too. But that's not a reason to use ALLOW FILTERING. If not more, it is equally bad.

I would suggest to create another table or change the current structure.

CREATE TABLE IF NOT EXISTS user_status (
    day date,
    id timeuuid,
    user text,
    PRIMARY KEY ((day), id)
);

This table is partitioned by each day. You will need to only query over that day's data and prepare the your data on client side program.

select * from user_status where day = '2015-05-03'

This is both not heavy on server and client side. Depending on the expected size of data, the partition key can be further tuned.

Hope this helps!

Upvotes: 1

Related Questions