davka
davka

Reputation: 14692

how to select the newly added rows in a table efficiently?

I need to periodically update a local cache with new additions to some DB table. The table rows contain an auto-increment sequential number (SN) field. The cache keeps this number too, so basically I just need to fetch all rows with SN larger than the highest I already have.

SELECT * FROM table where SN > <max_cached_SN>

However, the majority of the attempts will bring no data (I just need to make sure that I have an absolutely up-to-date local copy). So I wander if this will be more efficient:

count = SELECT count(*) from table;
if (count > <cache_size>)
  // fetch new rows as above

I suppose that selecting by an indexed numeric field is quite efficient, so I wander whether using count has benefit. On the other hand, this test/update will be done quite frequently and by many clients, so there is a motivation to optimize it.

Upvotes: 1

Views: 2228

Answers (5)

ABI
ABI

Reputation: 1754

You don't need to use SELECT COUNT(*)

There is two solution.

  1. You can use a temp table that has one field that contain last count of your table, and create new Trigger after insert on your table and inc temp table field in Trigger.

  2. You can use a temp table that has one field that contain last SN of your table is cached and create new Trigger after insert on your table and update temp table field in Trigger.

Upvotes: 1

Jon Black
Jon Black

Reputation: 16559

not much to this really

drop table if exists foo;
create table foo
(
foo_id int unsigned not null auto_increment primary key
)
engine=innodb;

insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);

select * from foo order by foo_id desc limit 10;

insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);

select * from foo order by foo_id desc limit 10;

Upvotes: -1

ajreal
ajreal

Reputation: 47321

this test/update will be done quite frequently and by many clients

this could lead to unexpected race competition for cache generation

I would suggest

  • upon new addition to your table add the newest id into a queue table
  • using like crontab to trigger the cache generation by checking queue table
  • upon new cache generated, delete the id from queue table

as you stress majority of the attempts will bring no data, the above will only trigger where there is new addition

and the queue table concept, even can expand for update and delete

Upvotes: 3

Victor Haydin
Victor Haydin

Reputation: 3548

I believe that

SELECT * FROM table where SN > <max_cached_SN>

will be faster, because select count(*) may call table scan. Just for clarification, do you never delete rows from this table?

Upvotes: 2

9000
9000

Reputation: 40894

SELECT COUNT(*) may involve a scan (even a full scan), while SELECT ... WHERE SN > constant can effectively use an index by SN, and looking at very few index nodes may suffice. Don't count items if you don't need the exact total, it's expensive.

Upvotes: 1

Related Questions