neel.1708
neel.1708

Reputation: 315

File processing on two different machine using spring batch

My file processing scenario is ,

 read input file -> process -> generated output file

but i have to two physically different machines which are connected to one storage area where i receive all the input files and one database server,There are two application servers running on these machine(1 on each server).

enter image description here

so how can i use spring batch to process input files on both these application server parallelly ? i mean if there are 10 files the 5 on server1 (P1) and 5 on (P2) ,can it be done ?

Upvotes: 10

Views: 1767

Answers (4)

Michal
Michal

Reputation: 644

Here are my suggestions:

  • create a locking table in db with file path as primary key. Then try to insert a record with this key - if succeeds, your code can continue and process the file, if fails (exception, that record with this primary key exists), then go to next file.

  • precise scheduling, as mentioned earlier by Jimmy

  • you can try to use a queue (like ActiveMQ, RabittMQ, ...) to synchronize your machines

Upvotes: 0

IceBox13
IceBox13

Reputation: 1358

The first thing would be to decide whether you actually want to split the files in half (5 and 5), or do you want each server processing until it's done? If the files are various sizes with some small and others larger, you may end up with optimal parallelization having 6 processed on one server and 4 on the other, or 7 and 3, if the 3 take as long as the other 7 because of differences in size.

A very rudimentary way would be to have a database table that could represent active processing. Your job could read the directory, grab the first file name, and then insert into the table that it was being processed by that JVM. If the primary key of the table is the filename, then if they both try at the same time, one would fail and one would succeed. The one that succeeds at inserting the entry in the table wins and gets to process the file. The other has to handle that exception, pick the next file, and attempt to insert it as a processing entry. This way each essentially establishes a centralized lock (in the db table), and you get more efficient processing that considers file size rather than even file distribution.

Upvotes: 1

Jimmy Praet
Jimmy Praet

Reputation: 2370

You could schedule a job per input file (input file location would be a parameter of the job). Spring Batch will guarantee no two job instances with the same job parameters are created. You'll get a JobExecutionAlreadyRunningException or JobInstanceAlreadyCompleteException if the other node has already started processing the same file.

Upvotes: 4

noOneInParticulat
noOneInParticulat

Reputation: 9

There's a pretty simple way of doing it. If i get it correct you put everyfile in database (some of info about it) and then remove to create a new output. You can Lock() on it, Before reading file u check

  for(File file : fileList.getFiles())
    try{
      (getting file + process it)
       }

and in process

     file.lock();
     try {
         ...
     } finally {
         file.unlock();
     }

Here is some information about Lock.

Upvotes: -1

Related Questions