GoldShip
GoldShip

Reputation: 139

What's the difference between many CONTAINS and only one?

I have a search engine that build query dynamically to an SQL Server database and I'm having different results with these queries:

CONTAINS('goal AND of AND Chelsea');

AND

CONTAINS('goal') AND CONTAINS('of') AND CONTAINS('Chelsea')

The first one returns me results and the second nothing.

I used these commands for Full-Text Search engine accept noise words:

sp_configure 'show advanced options', 1
RECONFIGURE
GO
sp_configure 'transform noise words', 1
RECONFIGURE
GO

Now, the first sql command is working fine, but the second, that is the command my search engine generates, doesn't work.

Upvotes: 1

Views: 110

Answers (1)

Martin Smith
Martin Smith

Reputation: 453807

"of" is a noise word.

I can reproduce your results with "transform noise words" on and inserting the following into a full text indexed table

INSERT INTO [dbo].[testtable]
            ([id],
             [text])
VALUES      (1,'goal'),
            (2,'Chelsea'),
            (3,'of'),
            (4,'Drogba of Chelsea scored a goal')

After the index is populated

SELECT *
FROM   sys.dm_fts_index_keywords(db_id('fts'), object_id('testtable')) 

Returns

+--------------------------------+--------------+-----------+----------------+
|            keyword             | display_term | column_id | document_count |
+--------------------------------+--------------+-----------+----------------+
| 0x006300680065006C007300650061 | chelsea      |         2 |              2 |
| 0x00640072006F006700620061     | drogba       |         2 |              1 |
| 0x0067006F0061006C             | goal         |         2 |              2 |
| 0x00730063006F007200650064     | scored       |         2 |              1 |
| 0xFF                           | END OF FILE  |         2 |              4 |
+--------------------------------+--------------+-----------+----------------+

From which I conclude that this option does not affect what is indexed - simply how the queries are treated.

CONTAINS('goal AND of AND Chelsea');

becomes

CONTAINS('goal AND Chelsea');

after ignoring the noise word.

CONTAINS('of')

cannot be answered from the index and SQL Server clearly decides it is better to return false than return true for a load of rows that don't event contain that text.

I'm not sure why it doesn't evaluate as unknown rather than false though

WHERE NOT CONTAINS([text],'of')

returns all rows - including those containing the term!

enter image description here

If you can't alter your query to use the pattern that works the way you want then you will need to get around the issue by including the noise words in the full text index.

After running

ALTER FULLTEXT INDEX ON [dbo].[testtable] SET STOPLIST = OFF

and rebuilding the index it works the way you want.

Upvotes: 2

Related Questions