Tanmay Bhatt
Tanmay Bhatt

Reputation: 25

How to extract the source IP address and info with regular expressions?

I want to extract the source IP address and info with regular expressions.

Here's a sample from the text file,

"No.","Time","Source","Destination","Protocol","Length","Info","SrcPort","Dest.port","Response time","Frequency","delta"
"","2007-11-13 18:10:53.940873","127.0.0.1","127.0.0.1","HTTP","162","GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0 ","43974","80","0.000000","","0.000000"
             I want to extract...    ^ this    ... and ...                     ^ this info

It can contain thousands of lines. I just want to extract the source IP address and info from each line.

Expected output would be,

127.0.0.1 GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0

Upvotes: 0

Views: 895

Answers (4)

SubOptimal
SubOptimal

Reputation: 22973

If you can ensure that a comma is never part of the fields 0-6 you could use following

String[] fields = s.split(",", 8);
System.out.println("source: " + fields[3]);
System.out.println("info  : " + fields[6]);

If you cannot ensure it, then prefer to use a CVS parser instead of a regex solution.

Upvotes: 1

callOfCode
callOfCode

Reputation: 1026

If you want to do this purely with regex:

public static void main(String[] args)
{   
    String s = "No.\",\"Time\",\"Source\",\"Destination\",\"Protocol\",\"Length\",\"Info\",\"SrcPort\",\"Dest.port\",\"Response time\",\"Frequency\",\"delta\",\"2007-11-13 18:10:53.940873\",\"127.0.0.1\",\"127.0.0.1\",\"HTTP\",\"162\",\"GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0 \",\"43974\",\"80\",\"0.000000\",\"\",\"0.000000";
    Matcher m = Pattern.compile("(?m)(?<IP>\(\\d){3}\\.(\\d\\.){2}\\d\).*?(?<METHOD>GET|POST|PUT|DELETE)(?<URI>.*?(?<HTTPVERSION>HTTP\\/\\d(\\.\\d)?))").matcher(s);
    m.find();
    System.out.println("Result " + m.group("IP") + " " + m.group("METHOD") + " " + m.group("URI") + " " + m.group("HTTPVERSION"));
}

P.S. Named groups works since Java 7. I've used named groups only for convenience, you could achieve the same result without named groups. Anyway, I wouldn't rely heavilly on regexes for such tasks. If you want to add even one rule, condicion etc.. regex grows very rapidly. Regex is not a magic stick. Use it with caution.

Upvotes: 1

Patrick
Patrick

Reputation: 12734

If you want simple Javacode and regex. You can try this example of a solution out:

    String text = "No.,Time,Source,Destination,Protocol,Length,Info,SrcPort,Dest.port,Response time,Frequency,delta,2007-11-13 18:10:53.940873,127.0.0.1,127.0.0.1,HTTP,162,GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0 ,43974,80,0.000000,,0.000000";

    String[] texts = text.split(",");
    StringBuilder output = new StringBuilder();

    boolean foundIp = false;
    for(String s : texts){
        if(s.matches("^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$") && !foundIp){
            output.append(s);
            foundIp = true;
            continue;
        }
        if(s.startsWith("GET") && s.trim().endsWith("HTTP/1.0")){
            output.append(" ").append(s.trim());
            continue;
        }
    }

    System.out.println(output.toString());

You can add some other rules like when no IP address is found do not print output or other stuff. Just like you want.

Output of the code:

127.0.0.1 GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0

Upvotes: 0

ArchLicher
ArchLicher

Reputation: 141

This matches both IP addr: (\d{1,3}).(\d{1,3}).(\d{1,3}).(\d{1,3}) And this for the info: (GET.*?)" -> this will give you the info in the first group.

Better to use CSV parser, though as suggested in the comments.

Upvotes: 0

Related Questions