Reputation: 1938
Is there any solution to generate list of tags (common phrases/words) from sql server full text index. I store some data in xml data type column. I would like to generate common words from that column (performance is on the first place). data changes frequently.
Upvotes: 1
Views: 356
Reputation: 543
If there already is a FTI on the table/column you can query the FT-Index (see below). If not, I would recommend to extract the relevant data out of the XML first. This can be quite demanding if the data is hidden in crude structures. It might be very easy if the XML-Schema is simple and your relevant phrases a easyly to address. Performance of the extraction might highliy depend on the XML-Schema and your needs. The extracted data might be in an extra table that can have an FTI. The building of the FTI also is a time consuming process. But again: It depends on your data.
To query all indexed word in the FTI of your_table:
select distinct display_term
from sys.dm_fts_index_keywords (db_id('your_database'), object_id('your_schem.your_table'))
--where column_id = 1
The indexed word comes "per indexed column" of your table, so filter on column_id as needed.
display_term is case sensitiv.
To filter them on "common phrases" you could use the stopword list.
select top 10 a.display_term
from sys.dm_fts_index_keywords (db_id('your_database'), object_id('your_schema.your_table')) a
left join sys.fulltext_stopwords b
--on a.display_term collate SQL_Latin1_General_CP1_CI_AS = b.stopword
on a.display_term = b.stopword COLLATE Latin1_General_BIN
and b.language_id = 1031 --or whatever you need
where b.stopword is null
Use collation as needed (case sensitive or CI)
Upvotes: 0
Reputation: 107766
forward link to Find Unique Words in One or More Columns? that asks the same thing
Upvotes: 2