Fabianius
Fabianius

Reputation: 715

Find and replace over multiple lines in a huge text file

I've got a huge log file (almost 6 GB) of a game server filled with millions of errors (hundreds were caused every second at that time) besides useful records that need to be kept. I'd like to remove all the lines including an error while keeping the ones showing chat messages or other information.

However, I can't just easily remove the lines I'd like to dump because the error messages aren't always the same and always require a different amount of lines. In short, I simply can't determine which lines include an error. I need a regular expression to do so. I've been looking for a program that fits my purposes. I haven't found one yet, though. sed (stream editor) could do such a job for instance as it wouldn't need too many resources to process such a huge file. However, it doesn't support finding and replacing over multiple lines.

Therefore, is there a program that supports finding and replacing regular expressions in huge text files over multiple lines? Or is it recommended to write your own script to do that job?

The log file looks as follows:

2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors. 
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
    at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
    at net.minecraft.server.BlockButton.a(BlockButton.java:170)
    at net.minecraft.server.ItemInWorldManager.a(ItemInWorldManager.java:160)
    at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:482)
    at net.minecraft.server.Packet15Place.a(SourceFile:57)
    at net.minecraft.server.NetworkManager.a(SourceFile:230)
    at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:75)
    at net.minecraft.server.NetworkListenThread.a(SourceFile:100)
    at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:357)
    at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
    at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:01 [INFO] <admin> Is it working yet? 
2011-03-02 01:43:01 [INFO] <admin> Not really. 
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
    at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
    at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:348)
    at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
    at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible. 

The desired result would be the following:

2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors.
2011-03-02 01:43:01 [INFO] <admin> Is it working yet? 
2011-03-02 01:43:01 [INFO] <admin> Not really. 
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible.

As you can see, the log file contains the same error over and over again. Even though it always starts with the date and time followed by [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms and ends with at net.minecraft.server.ThreadServerApplication.run(SourceFile:366), the error message in between is different each time. That's why I can't just replace the error message by an empty string.

Is there a regular expression that could both help me get rid of all the lines containing an error but keep the remaining lines? That way, my log file would shrink to under 50 MB in size as it used to be before all these errors were caused by my server due to a broken plugin.

Upvotes: 2

Views: 1287

Answers (2)

Josh Rosen
Josh Rosen

Reputation: 13821

This Python script makes one pass through a logfile read from stdin, printing the filtered log messages to stdout.

It uses a regular expression to match lines that mark the beginning of a log message (such as a line that starts with 2011-03-02 01:43:00 [).

If a line that begins a log message contains [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms, the script discards all lines between that line and the line containing the start of the next log message. Otherwise, it outputs the line. You can think of this as a finite state machine with two states, which correspond to whether the script is skipping over lines or outputting lines.

import sys
import re

START_OF_MESSAGE_RE = r"^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}"
ERROR_RE = START_OF_MESSAGE_RE + r' \[SEVERE\] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms$'
skip_until_next_message = False

for line in sys.stdin:
    line = line.rstrip()
    if re.match(START_OF_MESSAGE_RE, line):
        if re.match(ERROR_RE, line):
            skip_until_next_message = True
        else:
            skip_until_next_message = False
    if not skip_until_next_message:
        print line

I added some special cases to the log file for testing. Here's the log file that I tested it with:

2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors. 
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
    at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
    at net.minecraft.server.BlockButton.a(BlockButton.java:170)
    at net.minecraft.server.ItemInWorldManager.a(ItemInWorldManager.java:160)
    at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:482)
    at net.minecraft.server.Packet15Place.a(SourceFile:57)
    at net.minecraft.server.NetworkManager.a(SourceFile:230)
    at net.minecraft.server.NetServerHandler.a(NetServerHandler.java:75)
    at net.minecraft.server.NetworkListenThread.a(SourceFile:100)
    [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
    at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:357)
    at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
    at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:01 [INFO] <admin> Is it working yet? 
2011-03-02 01:43:01 [INFO] <admin> Not really. 
2011-03-02 01:43:01 [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms
java.lang.NoSuchMethodError: com.sk89q.worldedit.blocks.BlockType.isRedstoneBlock(I)Z
    at com.sk89q.craftbook.bukkit.MechanicListenerAdapter$MechanicBlockListener.onBlockRedstoneChange(MechanicListenerAdapter.java:174)
    at net.minecraft.server.MinecraftServer.h(MinecraftServer.java:348)
    at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:272)
    at net.minecraft.server.ThreadServerApplication.run(SourceFile:366)
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible. 
2011-03-02 01:43:01 [SEVERE] Another multi
line
log
message
2011-03-02 01:43:01 [INFO] <admin> Here's the error: [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms

And here's the output:

$ python minecraftlog.py < minecraft.log 
2011-03-02 01:43:00 [INFO] <admin> CraftBook is causing errors.
2011-03-02 01:43:01 [INFO] <admin> Is it working yet?
2011-03-02 01:43:01 [INFO] <admin> Not really.
2011-03-02 01:43:02 [INFO] <admin> I hope we find a solution as soon as ever possible.
2011-03-02 01:43:01 [SEVERE] Another multi
line
log
message
2011-03-02 01:43:01 [INFO] <admin> Here's the error: [SEVERE] Could not pass event REDSTONE_CHANGE to CraftBookMechanisms

Upvotes: 2

ewh
ewh

Reputation: 1024

It seems the better approach is to match the lines you want to keep, indirectly "removing" the lines you do not care about:

The following Perl script should suffice:

while (<>) {
  next unless /^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}\s\[INFO\]/;
  print;
}

Upvotes: 0

Related Questions