Reputation: 15505
I currently have a Java program that spawns 50 threads and the goal is to look at a directory that has many files being written to it and upload those files to an ftp server and then remove them. Right now I have a super hacky way of looping through the dir in each thread and setting a lock on a ConcurrentMap to keep track of when a thread already is processing that same image to prevent duplicate work. It's working but just doesn't seem right.
So the question is.. in Java what is a preferred way of watching a directory in a multithreaded program and making sure each thread is only operating on a file that no one else has.
Update: I was considering creating a threadpool with the caveat of each thread has an ftpclient connection that I'll have to keep open and keep from timing out.
Update: What about using http://download.oracle.com/javase/tutorial/essential/io/notification.html ?
Upvotes: 4
Views: 3476
Reputation: 23273
Does the solution really need to be multi-threaded? Unless the maximum upload speed to the destination FTP server is limited per connection, surely it'd be easier sending them one at a time?
Sending 50 files of 1MB sequentially at 1Mbps (assumed max upload speed) over a single FTP connection would be no slower than sending the same 50 files concurrently at ~20Kbps with 50 FTP connections, wouldn't it?
Upvotes: 0
Reputation: 1942
IMO, it's asking for trouble to try and write something that does this yourself. There are so many nuances to parallel batch processing, that it's best to learn the API to a framework that does it for you.
In the past I've used both Spring Batch (which is open source) and Flux (which requires a license). They'll both allow you to configure jobs that watch a directory for files, and then process those files in a parallel way. As long as you're willing to invest the time in learning their APIs, then you don't need to worry about synchronization on which process is handling which files.
Just a quick note on pros/cons of Spring Batch vs Flux:
I'm sure there are other frameworks that do this too... those are just the two I'm familiar with.
Upvotes: 1
Reputation: 169531
I would set up a filehandler class which accepts a directory and has a concurrently locked .nextFile function which passes the next file in the directory. This way every thread asks for a file and every thread gets a unique file
Upvotes: 0
Reputation: 140061
Use an ExecutorService
to decouple the submission of work to the threads from the threading logic itself (also take a look at the docs for the parent interface Executor
to learn a bit more about their purpose).
With an ExecutorService
, you simply feed work (in your case, a file) to it and threads will pick up work as they become available. There are many options and flavors of ExecutorServices you can configure: single-threaded, a maximum number of threads, unbounded thread pool, etc.
Upvotes: 4
Reputation: 2648
Maybe having a master thread searching the directory and giving tasks out to the worker threads?
Upvotes: 1