user2859298
user2859298

Reputation: 1443

Find files in a directory by Date

We are working with an external accounts program who save "printed documents" on a network share. Each "printed document" in the directory contains 3 files.

We have a customer who has 120,000 files in the directory.

Currently when the user want to see all the "printed documents" the software will loop all the files in the directory and then read each XML file and see if the report is for this user....... this takes 10 mins to read each time.

We are trying to create a faster solution.

The only idea I can think of is to loop the files and put the contents (file name, XML details) into a database table and record the "Last Scanned Date". The next time i loop through the files i can loop through and dismiss any items that are less than the "Last Scanned Date" or use a Linq query!? (borrowed from another post)

DateTime LastCreatedDate = Properties.Settings.Default["LastDateTime"].ToDateTime();
var filePaths = Directory.GetFiles(@"\\Printed\Reports\", "*_*.xml").Select(p => new {Path = p, Date = System.IO.File.GetLastWriteTime(p)})
    .OrderBy(x=>x.Date)
    .Where(x=>x.Date>=LastCreatedDate);

Is there a quicker solution?

Upvotes: 0

Views: 161

Answers (3)

Buddy
Buddy

Reputation: 11028

Perhaps it's the parsing of the XML that is taking a long time? You could do a basic "grep" of all the files for the user name/id and then do the actual XML parsing on just the matching files.

Upvotes: 0

Nathan C
Nathan C

Reputation: 205

Based on your use case, it looked like what you were asking for was to have a system where a user could ask for all of their printed documents. I didn't see where date was a part of the solution.

I can think of multiple quick solutions:

  1. Have a subdirectory for each user. As new files enter the main directory, have the file parsed and copied to the appropriate user subdirectories (allows for a file being associated to multiple users). This has the benefit of limiting the number of files per directory.
  2. Have a mapping (either through a DB, flat XML file, or flat XML file per user) which maps a file to the user. Then update the mapping with each new file, while also containing a listing of files which have already been processed so that you don't reprocess the file.
  3. Research document management database, if a more robust solution is desired. If you want to be able to searching for a lot of different types of meta-data, then a document management database would be a good idea.

NOTE - For ideas 1 and 2, you could process the new files as either part of a service, a task, or whenever a user makes a request for documents.

Upvotes: 0

Peter Smith
Peter Smith

Reputation: 5550

You could set up a Windows service which detects additions to the folder and then updates the database with the new entries. Thereafter, any queries on documents printed would be at the cost of a database query only.

Upvotes: 1

Related Questions