fmpdmb
fmpdmb

Reputation: 1414

filter log file by defining regexes

I have some HUGE log files (50Mb; ~500K lines) I need to start filtering some of the crap out of. The log files are being produced using log4j and have the basic pattern of:

[log-level] date-time class etc, etc  
log-message  

I'm looking for a way that I can identify a regex start and regex end (or something similar) that will filter out the matching entries from the file so I can more easily wade through these massive files. My thoughts are that the start regex would be the log-level and the end regex would be something in the log-message. I'm sure I could write a java program to accomplish this task, but I thought I'd ask the community before going down that path. Thanks in advance.


Let me expand on my question. Let's assume I have the following snippet in my log file:

[DEBUG] date-time class etc, etc  
log-message-1

[WARN] date-time class etc, etc  
log-message-2

[DEBUG] date-time class etc, etc  
log-message-3

[DEBUG] date-time class etc, etc  
log-message-1

[WARN] date-time class etc, etc  
log-message-2

[DEBUG] date-time class etc, etc  
log-message-6

I'd like a way to filter out logEntry1 and logEntry2 so I end up with:

[DEBUG] date-time class etc, etc  
log-message-3

[DEBUG] date-time class etc, etc  
log-message-6

I would hope to accomplish this be defining some sets of regex patterns pairs. In my example above, I'd want to define a pair for logEntry1 and another for logEntry2.

I hope that helps clarify my question.

Upvotes: 4

Views: 1523

Answers (3)

ghostdog74
ghostdog74

Reputation: 342313

Assuming log-message-1 and log-message-2 and unique patterns.

$ awk -vRS= '!/log-message-[12]/' ORS="\n\n" file
[DEBUG] date-time class etc, etc
log-message-3

[DEBUG] date-time class etc, etc
log-message-6

Upvotes: 4

ZyX
ZyX

Reputation: 53604

(zyx:~) % echo $T
[DEBUG] date-time class etc, etc  
log-message-1

[WARN] date-time class etc, etc  
log-message-2

[DEBUG] date-time class etc, etc  
log-message-3

[DEBUG] date-time class etc, etc  
log-message-1

[WARN] date-time class etc, etc  
log-message-2

[DEBUG] date-time class etc, etc  
log-message-6
(zyx:~) % echo $T | perl -e '$_=join("", <>); s/\[DEBUG\][^\n]*\n(log-message-1|log-message-2).*?(?=\n\[(DEBUG|WARN)\]|$)//sg; s/\[WARN\].*?(?=\n\[(DEBUG|WARN)\]|$)//sg; print;'


[DEBUG] date-time class etc, etc  
log-message-3



[DEBUG] date-time class etc, etc  
log-message-6

Upvotes: 1

osgx
osgx

Reputation: 94185

Use awk or awk-styled perl one-liners.

Upvotes: -1

Related Questions