marcinn
marcinn

Reputation: 1938

Dynamic tags generation from sql server database using full text search

Is there any solution to generate list of tags (common phrases/words) from sql server full text index. I store some data in xml data type column. I would like to generate common words from that column (performance is on the first place). data changes frequently.

Upvotes: 1

Views: 356

Answers (2)

Christian4145
Christian4145

Reputation: 543

If there already is a FTI on the table/column you can query the FT-Index (see below). If not, I would recommend to extract the relevant data out of the XML first. This can be quite demanding if the data is hidden in crude structures. It might be very easy if the XML-Schema is simple and your relevant phrases a easyly to address. Performance of the extraction might highliy depend on the XML-Schema and your needs. The extracted data might be in an extra table that can have an FTI. The building of the FTI also is a time consuming process. But again: It depends on your data.

To query all indexed word in the FTI of your_table:

select distinct display_term 
from sys.dm_fts_index_keywords (db_id('your_database'), object_id('your_schem.your_table'))
--where column_id = 1

The indexed word comes "per indexed column" of your table, so filter on column_id as needed.

display_term is case sensitiv.

To filter them on "common phrases" you could use the stopword list.

select top 10 a.display_term
from sys.dm_fts_index_keywords (db_id('your_database'), object_id('your_schema.your_table')) a
left join sys.fulltext_stopwords b 
--on a.display_term collate SQL_Latin1_General_CP1_CI_AS = b.stopword 
on a.display_term  = b.stopword COLLATE Latin1_General_BIN
and b.language_id = 1031 --or whatever you need
where b.stopword is null

Use collation as needed (case sensitive or CI)

Upvotes: 0

RichardTheKiwi
RichardTheKiwi

Reputation: 107766

forward link to Find Unique Words in One or More Columns? that asks the same thing

Upvotes: 2

Related Questions