Andrew Ho-Lee
Andrew Ho-Lee

Reputation: 155

Python performance: search large list vs sqlite

Lets say I have a database table which consists of three columns: id, field1 and field2. This table may have anywhere between 100 and 100,000 rows in it. I have a python script that should insert 10-1,000 new rows into this table. However, if the new field1 already exists in the table, it should do an UPDATE, not an INSERT.

Which of the following approaches is more efficient?

  1. Do a SELECT field1 FROM table (field1 is unique) and store that in a list. Then, for each new row, use list.count() to determine whether to INSERT or UPDATE
  2. For each row, run two queries. Firstly, SELECT count(*) FROM table WHERE field1="foo" then either the INSERT or UPDATE.

In other words, is it more efficient to perform n+1 queries and search a list, or 2n queries and get sqlite to search?

Upvotes: 2

Views: 3031

Answers (4)

MarkR
MarkR

Reputation: 63616

You appear to be comparing apples with oranges.

A python list is only useful if your data fit into the address-space of the process. Once the data get big, this won't work any more.

Moreover, a python list is not indexed - for that you should use a dictionary.

Finally, a python list is non-persistent - it is forgotten when the process quits.

How can you possibly compare these?

Upvotes: 0

Dan Stocker
Dan Stocker

Reputation: 702

I'm not familiar with sqlite but a general approach like this should work:

If there's a unique index on field1 and you're trying to insert a value that's already there you should get an error. If insert fails, you go with the update.

Pseudocode:

try
{
    insert into table (value1, value2)
}
catch(insert fails)
{
    update table set field2=value2 where field1=value1
}

Upvotes: 1

Nick
Nick

Reputation: 1825

If I understand your question correctly, it seems like you could simply use SQLite's built in conflict handling mechanism.

Assuming you have a UNIQUE constraint on field1, you could simple use:

INSERT OR REPLACE INTO table VALUES (...)

The following syntax is also supported (identical semantics):

REPLACE INTO table VALUES (...)

EDIT: I realise that I am not really answering your question, just providing an alternative solution which should be faster.

Upvotes: 9

sarnold
sarnold

Reputation: 104090

I imagine using a python dictionary would allow for much faster searching than using a python list. (Just set the values to 0, you won't need them, and hopefully a '0' stores compactly.)

As for the larger question, I'm curious too. :)

Upvotes: 0

Related Questions