Reputation: 148
I'm trying to setup full-text search for table
I did everything to make it work:
EXEC sp_fulltext_catalog 'DocumentsCatalog', 'create';
EXEC sp_fulltext_catalog 'DocumentsCatalog', 'start_full'
EXEC sp_fulltext_table 'dbo.DocumentBody', 'create', 'DocumentsCatalog', 'PK_DocumentBody'
EXEC sp_fulltext_column 'dbo.DocumentBody', 'Value', 'add', 0, 'Type'
EXEC sp_fulltext_table 'dbo.DocumentBody', 'start_change_tracking'
EXEC sp_fulltext_table 'dbo.DocumentBody', 'update_index'
Also to prevent the questions about supporting these types of documents by my database I did:
SELECT *
FROM sys.fulltext_document_types
WHERE document_type IN ('.doc', '.docx', '.html')
Output:
document_type |class_id | path |version |manufacturer
--------------|------------------------------------|---------|--------------|-----------------------
.doc |64F1276A-7A68-4190-882C-5F14B7852019| NULL | |
.docx |5A98B233-3C59-4B31-944C-0E560D85E6C3| NULL | |
.html |E0CA5340-4534-11CF-B952-00AA0051FE20| NULL | 12.0.6828.0 | Microsoft Corporation
After that I inserted 2 documents with types '.html' and '.docx'.
When I searching
SELECT *
FROM DocumentBody
WHERE CONTAINS([Value], 'some html content')
it working as expected, but when I doing the same for '.docx' content, it returns nothing.
Does anyone know why full-text search for '.docx' or '.doc' type doesn't work as expected?
Upvotes: 3
Views: 717
Reputation: 41759
Azure SQL database full text search does not support "binary" file formats like Office and PDF, only text formats (which includes html).
You can extract the text from Office documents programmatically, and store the extracted text in a full text indexed column
Upvotes: 2