Tim Amet
Tim Amet

Reputation: 148

Azure SQL Server full-text search doesn't work for .doc .docx types

I'm trying to setup full-text search for table

DocumentBody.

I did everything to make it work:

EXEC sp_fulltext_catalog 'DocumentsCatalog', 'create';
EXEC sp_fulltext_catalog 'DocumentsCatalog', 'start_full'
EXEC sp_fulltext_table 'dbo.DocumentBody', 'create', 'DocumentsCatalog',  'PK_DocumentBody'
EXEC sp_fulltext_column 'dbo.DocumentBody', 'Value', 'add', 0, 'Type'
EXEC sp_fulltext_table 'dbo.DocumentBody', 'start_change_tracking'
EXEC sp_fulltext_table 'dbo.DocumentBody', 'update_index'

Also to prevent the questions about supporting these types of documents by my database I did:

SELECT * 
FROM sys.fulltext_document_types 
WHERE document_type IN ('.doc', '.docx', '.html')

Output:

document_type |class_id                            | path    |version       |manufacturer
--------------|------------------------------------|---------|--------------|-----------------------
.doc          |64F1276A-7A68-4190-882C-5F14B7852019|    NULL |              |
.docx         |5A98B233-3C59-4B31-944C-0E560D85E6C3|    NULL |              |
.html         |E0CA5340-4534-11CF-B952-00AA0051FE20|    NULL |  12.0.6828.0 | Microsoft Corporation

After that I inserted 2 documents with types '.html' and '.docx'.

When I searching

SELECT *
FROM DocumentBody 
WHERE CONTAINS([Value], 'some html content')

it working as expected, but when I doing the same for '.docx' content, it returns nothing.

Does anyone know why full-text search for '.docx' or '.doc' type doesn't work as expected?

Upvotes: 3

Views: 717

Answers (1)

ErikEJ
ErikEJ

Reputation: 41759

Azure SQL database full text search does not support "binary" file formats like Office and PDF, only text formats (which includes html).

You can extract the text from Office documents programmatically, and store the extracted text in a full text indexed column

Upvotes: 2

Related Questions