Pradyut Bhattacharya
Pradyut Bhattacharya

Reputation: 5748

sql find rows partially matching a string

I want to find rows in table having rows which contains a string

For example, I m having rows in a column names 'atest' in a table named 'testing' -

test
a
cool
another

now I want to select the rows having a word from the string 'this is a test' using a sql

select * from testing where instr(atext, 'this is a test') >0;

but this is not selecting any row.

Upvotes: 0

Views: 2775

Answers (3)

Kovge
Kovge

Reputation: 2019

This is a 'reversed' like:

select * from testing where 'this is a test' LIKE CONCAT('%',atext,'%');

It can be slow on tables having a lot of records. This returns the rows, where the value of the atext column can be found in the given string. (for example matches when atext = 'is a t' because it can be found in the given string)

Or you can write a regex.

select * from testing where atext REGEXP '^(this|is|a|test)$';

This matching all rows what contains exactly the specified words. In your scripting or programming language, you should only replace spaces with | and add ^ to the begining of the string and $ to the ending of the string, and REGEXP, not equation. ("this is a test" -> ^this|is|a|test$ )

If you have a lot of records in the table, this queries can be slow. Because the sql engine does not use indexes in regexp queries.

So if you have a lot of rows on your table and does not have more than 4 000 000 words i recommend to make an indexing table. Example:

originalTable:
tid | atext (text)         
1   | this is        
2   | a word         
3   | a this
4   | this word      
5   | a is
....



indexTable:
wid | word (varchar)
1   | this
2   | is
3   | a
4   | word


switchTable:
tid | wid
1   | 1
1   | 2 
2   | 3
2   | 4
3   | 1
3   | 3
...

You should set indexes, tid, wid and word fields.

Than the query is:

SELECT o.*
FROM originalTable as o
JOIN switchTable as s ON o.tid = s.tid
JOIN indexTable as i on i.wid=s.wid
WHERE i.word = 'this' or i.word='is' or i.word='a' or i.word='test'

This query can be mutch faster if your originalTable have 'a lot' records, because here the sql engine can make indexed searches. But there is a bit more work when insert a row in the original table you must make insertions in the other two tables.

The result between the runtime of the 3 queries depends on your database table size. And that you want to optimize for insertions or selections. ( the rate between insert/update and select queryes )

Upvotes: 0

Pradyut Bhattacharya
Pradyut Bhattacharya

Reputation: 5748

with full text index -

select * from anti_spam where match (atext) against ("this is a test" in boolean mode);

Upvotes: 0

Explosion Pills
Explosion Pills

Reputation: 191819

Reverse the arguments to INSTR.

WHERE INSTR('this is a test', atext)

Upvotes: 2

Related Questions