TV Nath
TV Nath

Reputation: 490

Split File and Process each part while using a shared resource

I am using spring integration for polling a file. This single file contains multiple reports. I want to split the file into report files and save as different files.

<int-file:inbound-channel-adapter id="filesIn"
        directory="file:${fileInDirectory}" 
        filename-pattern="*.txt" 
        prevent-duplicates="true">
    <int:poller id="poller" fixed-delay="5000"/>
    </int-file:inbound-channel-adapter>

<int:service-activator input-channel="filesIn"
                                   output-channel="filesOut"
                                   ref="handler"/>

<int-file:outbound-channel-adapter id="filesOut"
                                   directory="file:${archiveDirectory}"
                                   delete-source-files="true"/>

Inside the handler, handling method inside handler is like following.

public List<ReportContent> splitTextToReports(File file){ 
     // split the file
     // store the file content text to ReportContent object
     // add to a List of ReportContent
}

ReportContent has following fields

There is another processing that is required for each ReportContent.

Following is the method that will process each element of the list returned in the above method.

public void processReportContent (ReportContent reportContent){
   // process report content and save the file in the relevant place
}

Two parts to the question.

  1. How to use a splitter to take over just after first master File is read. So that processing of each report can be done part of splitted objects.
  2. The Service that look up report path should use a common HashMap between all splitted objects. If a value based on the report type exists in this hash map, it will retrieve from this map. Otherwise a separate API call should be executed to retrieve the report path using the report type. Report type and the value (report) received from this API call will be stored in the map. The importance of Map is to avoid making unnecessary API calls.

Upvotes: 1

Views: 668

Answers (2)

Artem Bilan
Artem Bilan

Reputation: 121382

To process items in parallel there always was a trick for <splitter> like the downstream ExecutorChannel, so during the iteration of the splitted items we move to the next one immediately after sending the previous.

In addition for better throughput the splitter support Iterator for streaming.

I was going to suggest the FileSplitter for your task, but I guess that you don't split by lines, but by some other identificator. Maybe your content is just XML or JSON, which allows to determine part of the content enough easy.

From here that might not be so easy to provide some Iterator implementation for your case.

However I guess it doesn't matter. You have already the split logic and builds your List<ReportContent>.

Regarding the ConcurrentMap.

How about to take a look into the @Cacheable Spring support for your "hard" service, when the next call for the same key will just return the value from cache?

For this purpose you can use the directory-expression on the <int-file:outbound-channel-adapter>:

<int-file:outbound-channel-adapter directory-expression="@reportPathService.getPath(payload)" /> 

The same technique you can accept for the file name as well.

Note: pay attention to the default header for the file name: FileHeaders.FILENAME.

Upvotes: 1

Andriy Kryvtsun
Andriy Kryvtsun

Reputation: 3344

1. Instead of <int:service-activator input-channel="filesIn"... I would add a chain

<int:chain id="processor" input-channel="filesIn" output-channel="filesOut">
    <int:splitter>
        <bean class="...your impl of org.springframework.integration.splitter.AbstractMessageSplitter..."/>
    </int:splitter>
</int:chain>

and move your splitTextToReports logic into this splitter impl. So in the chain after the splitter you'll have a flat stream of ReportContent instances.

2. add transform step in the chain after splitter. Put your processReportContent logic here. The result of transformation: string with your report in payload, and file name in 'filename' message header variable.

API of your transformer may be like this

interface ReportContentTransformer {
   Message<?> transform(ReportContent content);
}

The chain will be looks like

<int:chain id="processor" input-channel="filesIn" output-channel="filesOut">
    <int:splitter>
       ...
    </int:splitter>
    <int:transformer ref="...ref on your ReportContentTransformer interface implementation bean..." method="transform"/>
</int:chain>

3. add to your outbound-channel-adapter attribute

filename-generator-expression="headers.get('filename')"

to use file name from filename variable while file storing.

Upvotes: 2

Related Questions