MWS
MWS

Reputation: 103

Searching through hundreds of HTML files

I am not sure how to start solving this problem so any suggestions will be of help.

My client has a number of static HTML pages running into hundreds of files. These under go updates every now and then and are overwritten on the website. We list these pages on the website via a simple left hand side explorer mimicking the folder structure in which these files are given to us.

We now want to give the ability to search these files and display matching results. Doing a brute search through such a large number of files is going to be very time consuming. Matching related words (for example plurals, misspellings etc) is also desirable. Showing results in the order of popularity would be a useful feature. I am not sure how to get started on this. Should we pre-process the html files after every update for instance? Any recommended indexing libraries available in .NET? What little programming has been done on the website has been done using C#.

Thanks MS

Upvotes: 1

Views: 604

Answers (3)

a coder
a coder

Reputation: 7669

Lucene.net may be of interest.

Upvotes: 2

Gabriel
Gabriel

Reputation: 897

I´d first write a simple program to transfer all those files contents to a database. Then you could implement your search properly without having to read all files every time.

Upvotes: 1

rufu5
rufu5

Reputation: 66

I am not sure if its within your budget, but Google can do it for you as user1161318 pointed out.

Try Google Site Search - http://www.google.co.uk/enterprise/search/products_gss.html

Upvotes: 0

Related Questions