Reputation: 79
I am facing a problem to read multi-line log message as a single message in our spring batch application configured with spring integration, this application has to read multiline log message (example exception stack trace) as a single message, later it has to process and classify the message for further indexing. Each line is identified by its timestamp (pattern mentioned above i.e. DATE_PATTERN) and it may continue mutltiple lines, I am trying to continue reading a message until I see another timestamp by overriding isEndOfRecord method from SimpleRecordSeparatorPolicy when second line reaches in preProcess method I am returning true for isEndOfRecord but this is not working as expected, could any one help me to read the mentioned log file by identifying the timestamp pattern?
I am using org.springframework.batch.item.file.FlatFileItemReader, and org.springframework.batch.item.file.mapping.PassThroughLineMapper as mapper.
Please see for complete message,
1) log message file :sample-message-test.log
2013-10-19 07:05:32.253 [My First Class..] LOG LEVEl first-message-line-1 first-message-line-1 first-message-line-1 first-message-line-1 first-message-line-1 first-message-line-1
first-message-line-2 first-message-line-2 first-message-line-2
first-message-line-3 first-message-line-3 first-message-line-3
first-message-line-4 first-message-line-4 first-message-line-4
first-message-line-5 first-message-line-5
first-message-line-6
2013-10-19 07:05:32.257 [My Second Class..] LOG LEVEl second-message-line-1 second-message-line-1 second-message-line-1 second-message-line-1 second-message-line-1 second-message-line-1
second-message-line-2 second-message-line-2 second-message-line-2
second-message-line-3 second-message-line-3 second-message-line-3
second-message-line-4 second-message-line-4 second-message-line-4
second-message-line-5 second-message-line-5
second-message-line-6
2013-10-19 07:05:32.259 [My Third Class..] LOG LEVEl third-message-line-1 third-message-line-1 third-message-line-1 third-message-line-1 third-message-line-1 third-message-line-1
third-message-line-2 third-message-line-2 third-message-line-2
third-message-line-3 third-message-line-3 third-message-line-3
third-message-line-4 third-message-line-4 third-message-line-4
third-message-line-5 third-message-line-5
third-message-line-6
2) Batch Configuration file
<batch:job id="fileReadingJob">
<batch:step id="flatFileReadingStep">
<batch:tasklet >
<batch:chunk reader="reader" writer="writer" commit-interval="10" />
</batch:tasklet>
</batch:step>
</batch:job>
<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="lineMapper">
<bean class="org.springframework.batch.item.file.mapping.PassThroughLineMapper"/>
</property>
<property name="bufferedReaderFactory">
<bean class="org.springframework.batch.item.file.DefaultBufferedReaderFactory"/>
</property>
<property name="recordSeparatorPolicy" >
<bean class="com.batchlog.explorer.batchio.FlatFileRecordSeperationPolicy"/>
</property>
<property name="resource" value="file:///#{systemProperties['logfolder']}/#{jobParameters['inputfile']}" />
</bean>
<bean id="writer" class="com.batchlog.explorer.batchio.FlatFileWriter" scope="step"/>
........
3)
public class FlatFileRecordSeperationPolicy extends SimpleRecordSeparatorPolicy {
public static final String STARTING_OF_THE_LINE = "-STARTING_OF_THE_LINE-";
public static final String CONTINUATION_OF_THE_FILE = "-CONTINUATION_OF_THE_FILE-";
public static final String END_OF_THE_LINE = "-END_OF_THE_LINE-";
public static final String END_OF_THE_LINE_CHARACER = " \n ";
public static final String DATE_PATTERN ="^(?>\\d\\d){1,2}-(?:0?[1-9]|1[0-2])-(\\s)?(?:2[0123]|[01][0-9]):? (?:[0-5][0-9])(?::?(?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?))?(?:Z|[+-](?:2[0123]|[01][0-9])(?::?(?:[0-5][0-9])))?.*?";
@Override
public boolean isEndOfRecord(String line) {
if(line.matches(DATE_PATTERN) || line.startsWith(STARTING_OF_THE_LINE)
|| line.contains(CONTINUATION_OF_THE_FILE) || line.startsWith(END_OF_THE_LINE)){
if(isNextLineStarts(line) || line.startsWith(END_OF_THE_LINE)){
return true;//to break line
}
}
return false; //to conitnue line
private boolean isNextLineStarts(String preProcessOfLine){
if(preProcessOfLine.contains(CONTINUATION_OF_THE_FILE) && !preProcessOfLine.endsWith(CONTINUATION_OF_THE_FILE)){
String[] lines = preProcessOfLine.split(CONTINUATION_OF_THE_FILE);
if(lines[1].trim().matches(DATE_PATTERN)){
return true;
}
}
return false;
}
@Override
public String preProcess(String line) {
if(line.matches(DATE_PATTERN) && !line.contains(CONTINUATION_OF_THE_FILE)){
line = new StringBuilder(STARTING_OF_THE_LINE).append(line).toString();
}else if(line.startsWith(STARTING_OF_THE_LINE) && !line.contains(CONTINUATION_OF_THE_FILE)){
line = new StringBuilder(line.substring(STARTING_OF_THE_LINE.length())).append(CONTINUATION_OF_THE_FILE).toString();
}else if(line.contains(CONTINUATION_OF_THE_FILE) && !line.endsWith(CONTINUATION_OF_THE_FILE)){
String[] lines = line.split(CONTINUATION_OF_THE_FILE);
if(lines[1].trim().matches(DATE_PATTERN)){
line = new StringBuilder(END_OF_THE_LINE).append(lines[0]).toString();//.append(lines[1]).toString();
}else{
line = new StringBuilder(lines[0]).append(lines[1]).append(CONTINUATION_OF_THE_FILE).toString();
}
}
return super.preProcess(line);
}
@Override
public String postProcess(String record) {
if(record.startsWith(END_OF_THE_LINE)){
record = new StringBuilder(record.substring(END_OF_THE_LINE.length())).toString();
}else if(record.contains(CONTINUATION_OF_THE_FILE) && !record.endsWith(CONTINUATION_OF_THE_FILE)){
String[] lines = record.split(CONTINUATION_OF_THE_FILE);
if(lines[1].trim().matches(DATE_PATTERN)){
record = new StringBuilder(END_OF_THE_LINE).append(lines[0]).toString();
}else{
record = new StringBuilder(lines[0]).append(lines[1]).toString();
}
}
return super.postProcess(record);
}
Upvotes: 2
Views: 7691
Reputation: 12043
Your problem does not feet in to the RecordSeparatorPolicy.isEndOfRecord(String) paradigm.
isEndOfRecored works nicely when the lined ending is placed in the last line.
For example in DefaultRecordSeparatorPolicy it makes sure that you have an even count of
quotes. The last quote includes in the required record. In your case you will be over-reading one line.
Your basic idea of using postProcess and preProcess might work, but you still get FlatFileParseException from FlatFileItemReader on the last line when you reach the EOL and readline returns null see applyRecordSeparatorPolicy(String line) in FlatFileItemReader.
private String applyRecordSeparatorPolicy(String line) throws IOException {
String record = line;
while (line != null && !recordSeparatorPolicy.isEndOfRecord(record)) {
line = this.reader.readLine();
if (line == null) {
if (StringUtils.hasText(record)) {
// A record was partially complete since it hasn't ended but
// the line is null
throw new FlatFileParseException("Unexpected end of file before record complete", record, lineCount);
}
else {
// Record has no text but it might still be post processed
// to something (skipping preProcess since that was already
// done)
break;
}
}
else {
lineCount++;
}
record = recordSeparatorPolicy.preProcess(record) + line;
}
return recordSeparatorPolicy.postProcess(record);
}
In such case your output file will be missing lines based on the commit-interval and isEndOfRecord logic.
So basically I suggest using a different approach, did bellabax solution worked for you?
Upvotes: 0
Reputation: 18413
Write your own ItemReader as described in multiorder-line example or as described in this post.
Upvotes: 3