Kaypro II
Kaypro II

Reputation: 3350

How to poll a directory and not hit a file-transfer race condition?

I am working on an application that polls a directory for new input files at a defined interval. The general process is:

  1. Input files FTP'd to landing strip directory by another app
  2. Our app wakes up
  3. List files in the input directory
  4. Atomic-move the files to a separate staging directory
  5. Kick off worker threads (via a work-distributing queue) to consume the files from the staging directory
  6. Go to back sleep

I've uncovered a problem where the app will pick up an input file while it is incomplete and still in the middle of being transferred, resulting in a worker thread error, requiring manual intervention. This is a scenario we need to avoid.

I should note the file transfer will complete successfully and the server will get a complete copy, but this will happen to occur after the app has given up due to an error.

I'd like to solve this in a clean way, and while I have some ideas for solutions, they all have problems I don't like.

Here's what I've considered:

  1. Force the other apps (some of which are external to our company) to initially transfer the input files to a holding directory, then atomic-move them into the input directory once they're transferred. This is the most robust idea I've had, but I don't like this because I don't trust that it will always be implemented correctly.
  2. Retry a finite number of times on error. I don't like this because it's a partial solution, it makes assumptions about transfer time and file size that could be violated. It would also blur the lines between a genuinely bad file and one that's just been incompletely transferred.
  3. Watch the file sizes and only pick up the file if its size hasn't changed for a defined period of time. I don't like this because it's too complex in our environment: the poller is a non-concurrent clustered Quartz job, so I can't just persist this info in memory because the job can bounce between servers. I could store it in the jobdetail, but this solution just feels too complicated.

I can't be the first have encountered this problem, so I'm sure I'll get better ideas here.

Upvotes: 1

Views: 1245

Answers (1)

Les Ferguson
Les Ferguson

Reputation: 351

I had that situation once, we got the other guys to load the files with a different extension, e.g. *.tmp, then after the file copy is completed they rename the file with the extension that my code is polling for. Not sure if that is as easily done when the files are coming in by FTP tho.

Upvotes: 3

Related Questions