Reputation: 1674
I'm developing a web application using ASP.NET and SQL Server 2008. I have about 1,000,000 articles in my DB and I've activated Full-Text-Search.
Now, I want to search in articles and I need 3 functionality for the search textbox:
The first scenario could be easily implemented as below:
SELECT TOP 10 *
FROM sys.dm_fts_index_keywords(db_id('MyDB'), object_id('Articles'))
WHERE display_term LIKE 'Wor%'
ORDER BY document_count DESC
I don't know how to implement the second scenario. IMO I should create a table (FirstWord, SecendWord, Frequency) to store term sequences and their frequencies (extract them from article contents) and search over the first word order by frequency.
For the third scenario: I've read about spell check algorithms (ASpell,NSpell,...), I want to know that is there any SQL Server + .NET implementation for them?
Is there any solution to meet my needs? I prefer to implement all of them as SP in in SQL Server.
Update: Here is the content of a record which I should search:
Hitler and Mussolini lent much military and financial support to the Nationalist insurrection led by general Francisco Franco in Spain. The Soviet Union supported the existing government, the Spanish Republic, which showed leftist tendencies. Furthemore, over 30,000 foreign volunteers, known as the International Brigades fought against Franco. Both Germany and the USSR used this proxy war as an opportunity to test improved weapons and tactics. The deliberate Bombing of Guernica by the German Condor Legion in April 1937 contributed to widespread concerns that the next major war would include extensive terror bombing attacks on civilians.[30][31] While there were some minor pockets of resistance, the Nationalist front declared victory on 1 April 1939.[32] It should be noted that five months later, Germany attacked Poland, initiating World War II.
Upvotes: 1
Views: 3232
Reputation: 36
You could look into AmisaDB which we use. Covers all the 3 scenarios you mention using simple inbuilt sql functions.
Upvotes: 0
Reputation: 176
There are several hamming distance implementations in .net that you could rig into your app to do scenario 3.
Scenario 2 sounds like a lot of manual work on your part. Contextual metadata is a beast. Easiest backend is using a table to keep track of things, but populating the data is another story altogether.
Upvotes: 1
Reputation: 950
I would recommend using a search engine like (Lucene) SOLR for this. It is tailor made for doing just what you ask for. Now SOLR does require Java (usually with Tomcat) to run, but if you can accept that, using Solr.Net is a wonderful experience.
Upvotes: 4
Reputation: 337
Only for second query, you can use same with one change
SELECT TOP 10 *
FROM sys.dm_fts_index_keywords(db_id('MyDB'), object_id('Articles'))
WHERE display_term LIKE 'Wor %'
ORDER BY document_count DESC
just put an space bar after your word and the % sign.
Upvotes: 0
Reputation: 4737
Your query will work for second scenario as well. For the 3rd one, you may use your letters one by one having wild characters between them so your where statement may look like this:
WHERE display_term LIKE 'W_o_r%'
Upvotes: 0