PKKhandelwal
PKKhandelwal

Reputation: 3

How to use containstable function of sql server for both text and html

I am working in a application in which client send the keywords from the front end for the resume and by using containstable we search column of that table for that keywords and result back to front end.

In my application we were initially saving resume details in text format and after that we start to add in the html form. Now the search functionality is working fine for the older resume text but not getting the expected result in case of Html so what can we do so that containstable search functionality will work for both of them i.e. either for text or html

Upvotes: 0

Views: 304

Answers (2)

DCaugs
DCaugs

Reputation: 494

Oy vey. You've entered the murky world of full-text indexing...it's a rough climb from here, so pack it in my friend. =)

You left out some of the finer details, so before we proceed, I am assuming two things:

  1. The column containing the resume text is a varchar(max) or nvarchar(max)

  2. You have created a full-text index on the table containing said column

Now, down to the nitty-gritty. First let me say I am by no means a SQL Server full-text index expert (is anyone really?)...what I've learned has been by painful trail-and-error, so take this for what it is. That being said, your situation sounds very similar to one I faced earlier this year in that we inherited a system which included a full-text indexed column of the text type (which we migrated to varchar(max)) and this column contained both 'plain' text and html wrapped text. The challenge we faced was that when an end user performed a search, via the front end application, against this full-text index, the query would pick up hits on both the plain text AND the html. So, for example, if the user searched for 'Roman' they may return hits from both the plain text content as well as the the html tags that reference 'Times New Roman'...not the desired behavior.

The bad news is there really is not a straightforward solution that I have found for this. The only possible SQL Server side solution that I am aware of involves converting the data type of the column to varbinary(max), creating a 'companion' column that designates the varbinary(max) column as type 'HTML', and then leveraging the Microsoft iFILTER for HTML...more on that here, and here, and some good general BOL resources here and here.

In the end, we determined this was not the path for us because:

  1. We were not convinced, even with 100% successful implementation of the iFILTER/Full-text indexing features, that it would perform as needed
  2. Converting the column to varbinary(max) had performance implications in and of itself because we would have to convert all reads and writes to and from the varbinary data type on the fly...introducing a layer of complexity for the application code and the optimizer that we were not keen on.

We ended up implementing application side logic that helped clean up the results and that sufficiently met the needs of this particular project.

I don't want to completely discourage you from trying to achieve what you are attempting, but I do want you to at least go into it with open eyes, aware of the challenges...hopefully that will save you some frustration and wasted time!

Please do the community a favor and post back any successes or lessons learned - there is scant enough information on this out there and it would be a great help to others.

Best of luck!

Upvotes: 1

Stavros Zotalis
Stavros Zotalis

Reputation: 726

Looking at the documentation msdn you can see that for a varbinary(max) or image you can specify a secondary column containing the file type of varbinary(max) column.

column_name [ TYPE COLUMN type_column_name ]

At indexing time, the Full-Text Engine uses the abbreviation in the type column of each table row to identify which full-text search filter to use for the document in column_name. The filter loads the document as a binary stream, removes the formatting information, and sends the text from the document to the word-breaker component. For more information, see Configure and Manage Filters for Search.

Upvotes: 0

Related Questions