SQL Server: Performance of searching for hex strings in large tables (using LIKE, Full-Text Search, etc.)

Question

I have a table with 40+ million rows in MS SQL Server 2019. One of the columns store pure hexadecimal strings (both binary and readable ASCII content). I need to search this table for rows containing a specific hex string.

Normally, I would do this:

SELECT * FROM transactionoutputs WHERE outhex LIKE '%74657374%' ORDER BY id DESC OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

Since the results are paginated, it can take less than a second to find the first 10 results. However, when increasing the offset, or searching for strings that only appear 1-2 times in the entire table, it can take more than a minute, at which point my application will time out.

The execution plan for this query is this:

Are there any easy ways to improve the performance of such a search?

Using this answer, I was able to reduce the query time from 33 seconds to 27 seconds:

SELECT * FROM transactionoutputs WHERE
CHARINDEX('74657374' collate Latin1_General_BIN, outhex collate Latin1_General_BIN) > 0
ORDER BY id DESC OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

When I leave out the ORDER BY and pagination, I can reduce this to 19 seconds. This is not ideal because I need both the ordering and pagination. It still has to scan the entire table

I have tried the following:

Create an index on that column. This has no noticeable effect.
I came across this article about slow queries. Initially, I was using parameterized queries in my application, which was much slower than running them in SSMS. I have since moved to the query shown above, but it is still slow.
I tried to enable Multiple Active Result Sets (MARS), but without any improvement in query time.

I also tried using Full-Text Search. This seemed to be the most promising solution as text search is exactly what I need. I created a full-text index and can do a similar query like above, but using the index:

SELECT * FROM transactionoutputs WHERE CONTAINS(outhex,'7465') ORDER BY id desc OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;

This returns results almost instantly. However, when the query is longer than a few characters (often 4), it doesn't return anything. Am I doing something wrong or why is it doing that?

The execution plan:

My understanding is that my case is not the ideal use case for FTS, as it is designed to search in readable text and not hexadecimal strings. Is it possible to use anyway, and if so, how?

After reading dozens of articles and SO posts, I can not confidently say I know how to improve the performance of such queries for my specific use case, if it is even possible at all. So, is there any easy option to improve this?

SQL Server: Performance of searching for hex strings in large tables (using LIKE, Full-Text Search, etc.)

Answers (1)

Related Questions