Reputation: 1920
I have a MySQL log file that has all sorts of information on each file (When a connection was made, when a query was made, when the connection was ended etc.) I have to parse the log file so I can take the data on each line, put it in an array, then do some calculation based on it.
Here is a sample from the log file:
151011 12:52:51 1 Connect [email protected] on testdb
1 Query SHOW SESSION VARIABLES
1 Query SHOW COLLATION
1 Query SET character_set_results = NULL
1 Query SET autocommit=1
1 Query SELECT q1,q2 FROM q_table
1 Query SELECT s1,s2 FROM s_table
1 Query select count(*) as c from i_table WHERE val = 1
1 Query select count(*) as c from k_table WHERE cid = 1
1 Query SELECT name,age FROM i_table WHERE ck = 1
151011 12:52:54 1 Query SELECT name,aid FROM j_table WHERE co = 1
151011 12:52:59 1 Query SELECT * from values where lastname='smith'
Unfortunately the spaces in the line are not separated by a tab character ("\t"). Worse, some lines have additional date and time at the beginning while some don't. Which means some lines have more data to parse than others. How would I parse this log file?
So far, I had the following:
Scanner scan = new Scanner(new File("data.log"));
ln = scan.nextLine();
ar = ln.split("\t");
System.out.println(ar[0]);
System.out.println(ar[1]);
But that prints the following line, for example:
151018 12:52:51 // First slot in the array
1 Connect [email protected] on tested // Second slot in the array
Is there any way to do this? Or is just not possible?
Upvotes: 1
Views: 280
Reputation: 541
Seems to me you want to do a regex with the following groups separated by whitespace:
a group that starts with non-whitespace and continues with anything
String dateTime, number, type, message;
Pattern pattern = Pattern.compile(
"(\\d{6} \\d{2}:\\d{2}:\\d{2})?\\s+(\\d+)\\s+(Connect|Query)\\s+([^\\s].*)");
Matcher matcher = pattern.matcher(ln);
if (matcher.matches()) {
dateTime = matcher.group(1);//this will be null if no date
number = matcher.group(2);
type = matcher.group(3);
message = matcher.group(4);
}
Upvotes: 2