nick78
nick78

Reputation: 261

Camel Split InputStream by length not by token

I have an input file like this

1234AA11BB4321BS33XY...

and I want to split it into single messages like this

Message 1: 1234AA11BB
Message 2: 4321BS33XY

transform the records into Java objects, marshal them to xml with jaxb and aggregate about 1000 records in the outgoing Message.

Transformation and marshalling is no problem but I can't split the String above. There is no delimiter but the length. Every Record is exactly 10 characters long. I was wondering if there is an out of the box solution like

split(body().tokenizeBySize(10)).streaming()

Since in reality each record consists of 300 characters and there may be 500.000 records in a file, I want to split an InputStream.

In other examples I saw custom iterators used for splitting but all of them where token or xml based.

Any idea?

By the way we are bound to Java 6 and camel 2.13.4

Thanks Nick

Upvotes: 5

Views: 2654

Answers (2)

Miloš Milivojević
Miloš Milivojević

Reputation: 5369

The easiest way would be to split by empty string - .split().tokenize("", 10).streaming() - meaning that tokenizer will take each character - and group 10 tokens (characters) together and then aggregate them into a single group e.g.

@Override
public void configure() throws Exception {
  from("file:src/data?delay=3000&noop=true")
      .split().tokenize("", 10).streaming()
      .aggregate().constant(true) // all messages have the same correlator
        .aggregationStrategy(new GroupedMessageAggregationStrategy())
        .completionSize(1000)
        .completionTimeout(5000) // use a timeout or a predicate 
                                 // to know when to stop
      .process(new Processor() { // process the aggregate
        @Override
        public void process(final Exchange e) throws Exception {
          final List<Message> aggregatedMessages = 
            (List<Message>) e.getIn().getBody();
          StringBuilder builder = new StringBuilder();
          for (Message message : aggregatedMessages) {
            builder.append(message.getBody()).append("-");
          }
          e.getIn().setBody(builder.toString());
        }
      })
      .log("Got ${body}")
      .delay(2000);
}

EDIT

Here's my memory consumption in streaming mode with 2s delay for a 100MB file:

Memory consumption in streaming mode with 2s delay for 100MB file

Upvotes: 4

Souciance Eqdam Rashti
Souciance Eqdam Rashti

Reputation: 3193

Why not let a normal java class do the splitting and refer to it? See here: http://camel.apache.org/splitter.html

Code example taken from the documentation.

The below java dsl uses the "method" to call the split method defined in a separate class.

from("direct:body")
        // here we use a POJO bean mySplitterBean to do the split of the payload
        .split().method("mySplitterBean", "splitBody")

Below you define your splitter and return each split message.

public class MySplitterBean {

    /**
     * The split body method returns something that is iteratable such as a java.util.List.
     *
     * @param body the payload of the incoming message
     * @return a list containing each part splitted
     */
    public List<String> splitBody(String body) {
        // since this is based on an unit test you can of cause
        // use different logic for splitting as Camel have out
        // of the box support for splitting a String based on comma
        // but this is for show and tell, since this is java code
        // you have the full power how you like to split your messages
        List<String> answer = new ArrayList<String>();
        String[] parts = body.split(",");
        for (String part : parts) {
            answer.add(part);
        }
        return answer;
    }

Upvotes: -1

Related Questions