Reputation: 371
Background:
I have a Sharepoint Foundation 2010 installation that is being used to store scanned images of paper documents, making an electronic version of paper file folders we keep for each of our company's Clients. All of the documents are all stored as PDF files.
The configuration includes a web-server housing Sharepoint and the Search Server 2010 Express service, as well as separate database server housing the content data as well as the search crawl store. Both the Sharepoint/Search box, and the SQL box are VMware VMs running on shared hosts (including a shared SAN) with our other production servers.
Each file added to sharepoint must be added through a custom interface, including metadata tags for client information (a site content type with a set of site columns defines this extra metadata). We then expose this client identifying data with the search server by setting Managed Properties so we can do queries against the search webservice specifying WHERE CustomClientID = X.
Our data currently resides in two large document libraries, one for each arm of the company.
After a few years of operation our server now has some 250,000 documents and we are having issues with full crawls (running weekly off hours) sometimes crashing part way through, and our incrementals (running every 5 min during work hours) take 7-8 minutes to pick up 2-3 new files.
Question:
I was wondering if there was a way to get the search server crawler to only pick up the metadata we are supplying and ignore the document contents entirely, which I assume would speed up the crawl process by orders of magnitude. I believe this feature is described as full text search, but have not been successful in finding anything that explains if this is something that can be turned off.
If not, is there an alternative option for speeding up crawl times that anyone would advise?
Upvotes: 2
Views: 363