Reputation: 25
I want to extract the source IP address and info with regular expressions.
Here's a sample from the text file,
"No.","Time","Source","Destination","Protocol","Length","Info","SrcPort","Dest.port","Response time","Frequency","delta"
"","2007-11-13 18:10:53.940873","127.0.0.1","127.0.0.1","HTTP","162","GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0 ","43974","80","0.000000","","0.000000"
I want to extract... ^ this ... and ... ^ this info
It can contain thousands of lines. I just want to extract the source IP address and info from each line.
Expected output would be,
127.0.0.1 GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0
Upvotes: 0
Views: 895
Reputation: 22973
If you can ensure that a comma is never part of the fields 0-6
you could use following
String[] fields = s.split(",", 8);
System.out.println("source: " + fields[3]);
System.out.println("info : " + fields[6]);
If you cannot ensure it, then prefer to use a CVS parser instead of a regex solution.
Upvotes: 1
Reputation: 1026
If you want to do this purely with regex:
public static void main(String[] args)
{
String s = "No.\",\"Time\",\"Source\",\"Destination\",\"Protocol\",\"Length\",\"Info\",\"SrcPort\",\"Dest.port\",\"Response time\",\"Frequency\",\"delta\",\"2007-11-13 18:10:53.940873\",\"127.0.0.1\",\"127.0.0.1\",\"HTTP\",\"162\",\"GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0 \",\"43974\",\"80\",\"0.000000\",\"\",\"0.000000";
Matcher m = Pattern.compile("(?m)(?<IP>\(\\d){3}\\.(\\d\\.){2}\\d\).*?(?<METHOD>GET|POST|PUT|DELETE)(?<URI>.*?(?<HTTPVERSION>HTTP\\/\\d(\\.\\d)?))").matcher(s);
m.find();
System.out.println("Result " + m.group("IP") + " " + m.group("METHOD") + " " + m.group("URI") + " " + m.group("HTTPVERSION"));
}
P.S. Named groups works since Java 7. I've used named groups only for convenience, you could achieve the same result without named groups. Anyway, I wouldn't rely heavilly on regexes for such tasks. If you want to add even one rule, condicion etc.. regex grows very rapidly. Regex is not a magic stick. Use it with caution.
Upvotes: 1
Reputation: 12734
If you want simple Javacode and regex. You can try this example of a solution out:
String text = "No.,Time,Source,Destination,Protocol,Length,Info,SrcPort,Dest.port,Response time,Frequency,delta,2007-11-13 18:10:53.940873,127.0.0.1,127.0.0.1,HTTP,162,GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0 ,43974,80,0.000000,,0.000000";
String[] texts = text.split(",");
StringBuilder output = new StringBuilder();
boolean foundIp = false;
for(String s : texts){
if(s.matches("^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$") && !foundIp){
output.append(s);
foundIp = true;
continue;
}
if(s.startsWith("GET") && s.trim().endsWith("HTTP/1.0")){
output.append(" ").append(s.trim());
continue;
}
}
System.out.println(output.toString());
You can add some other rules like when no IP address is found do not print output or other stuff. Just like you want.
Output of the code:
127.0.0.1 GET /scripts/..%25%35%63../winnt/system32/cmd.exe?/c+dir HTTP/1.0
Upvotes: 0
Reputation: 141
This matches both IP addr: (\d{1,3}).(\d{1,3}).(\d{1,3}).(\d{1,3}) And this for the info: (GET.*?)" -> this will give you the info in the first group.
Better to use CSV parser, though as suggested in the comments.
Upvotes: 0