Andrej Kesely
Andrej Kesely

Reputation: 195418

Efficient searching for substrings in SQL [Python/SQLite]

I have two SQLite tables (list1 and list2) each with only one text column (val). I want efficiently search all combinations, where list2.value can be substring in list1.value.

Currently I have this solution:

import sqlite3

list1 = ["this is string1", "this is string2", "this is string3"]
list2 = ["string1", "string2"]

in_memory = sqlite3.connect(':memory:')
c = in_memory.cursor()
c.execute('CREATE TABLE list1 (val text NOT NULL)')
c.execute('CREATE TABLE list2 (val text NOT NULL)')

for v in list1:
    c.execute("INSERT INTO list1 VALUES (?)", (v, ))

for v in list2:
    c.execute("INSERT INTO list2 VALUES (?)", (v, ))

l = [*c.execute("SELECT list1.val, list2.val FROM list1, list2 WHERE instr(list1.val, list2.val)")]
print(l)

Prints correctly:

[('this is string1', 'string1'), ('this is string2', 'string2')]

Is there more effective SQL solution than iterating over each list1.val and list2.val combination and search if there's substring?

Upvotes: 1

Views: 377

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269623

You can phrase this as a single query:

select l1.value, l2.value
from list1 l1 join
     list2 l2
     on l1.val like '%' || l2.val || '%';

Doing the loop inside the database is slightly more efficient that doing the loop yourself -- because only matching rows are returned and you don't have the overhead of multiple queries.

However, this will still be doing nested loops. Such a query cannot take advantage of traditional indexes.

Upvotes: 2

Related Questions