foz1284
foz1284

Reputation: 373

directory monitoring

What is the best way for me to check for new files added to a directory, I dont think the filesystemwatcher would be suitable as this is not an always on service but a method that runs when my program starts up.

there are over 20,000 files in the folder structure I am monitoring, at present I am checking each file individually to see if the filepath is in my database table, however this is taking around ten minutes and I would like to speed it up is possible,

I can store the date the folder was last checked - is it easy to get all files with createddate > last checked date.

anyone got any Ideas?

Thanks

Mark

Upvotes: 5

Views: 1040

Answers (6)

Diego Pereira
Diego Pereira

Reputation: 91

You can write in somewhere the last timestamp that onfile was created, it is simple and can work for you.

Upvotes: 1

JosefAssad
JosefAssad

Reputation: 4118

Having a FileSystemWatcher service like Kevin Jones suggests is probably the most pragmatic answer, but there are some other options.

You can watch the directory with inotify if you mount it with Samba on a linux box. That of course assumes you don't mind fragmenting your platform, but that's what inotify is there for.

And then more correctly but with correspondingly less chance of you getting a go-ahead, if you're sitting monitoring a directory with 20K files in it it is probably time to evolve your system architecture. Not knowing all that much more about your application, it sounds like a message queue might be worth looking at.

Upvotes: 0

Mick
Mick

Reputation: 81

10 minutes seems awfully long for 20,000 files. How are you going about doing the comparison? Your suggestion doesn't account for deleted files either. If you want to remove those from the database, you will have to do a full comparison.

Perhaps the problem is the database round trips. You can retrieve a known file list from the database in large chunks (or all at once), sorted alphabetically. Sort the local file list as well and walk the two lists, processing missing or new entries as you go along.

Upvotes: 2

Oded
Oded

Reputation: 498914

FileSystemWatcher is not reliable, so even if you could use a service, it would not necessarily work for you.

The two options I can see are:

  1. Keep a list of files you know about and keep comparing to this list. This will allow you to see if files were added, deleted etc. Keep this list in memory, instead of querying the database for each file.
  2. As you suggest, store a timestamp and compare to that.

Upvotes: 1

TomTom
TomTom

Reputation: 62093

Your approach is the only feasible (i.e. file system watcher allows you to see changes, not check on start).

Find out what takes so long. 20.000 checks should not take 10 minutes - maybe 1 maximum. Your program is written slowly. How do you test it?

Hint: do not ask the database, get a list of all files into memory, a list of all filesi n the database, check in memory. 20.000 SQL statements to the database are too slow, this way you need ONE to get the list.

Upvotes: 5

Kevin Jones
Kevin Jones

Reputation: 2367

Can you write a service that runs on that machine? The service can then use FileSystemWtcher

Upvotes: 0

Related Questions