lars_bx
lars_bx

Reputation: 93

Use RegEx in Java to extract parameters in between parentheses

I'm writing a utility to extract the names of header files from JSPs. I have no problem reading the JSPs line by line and finding the lines I need. I am having a problem extracting the specific text needed using regex. After looking at many similar questions I'm hitting a brick wall.

An example of the String I'll be matching from within is:

<jsp:include page="<%=Pages.getString(\"MY_HEADER\")%>" flush="true"></jsp:include>

All I need is MY_HEADER for this example. Any time I have this tag:

<%=Pages.getString

I need what comes between this:

<%=Pages.getString(\"  and this: )%>

Here is what I have currently (which is not working, I might add) :

String currentLine;
while ((currentLine = fileReader.readLine()) != null)
{
Pattern pattern = Pattern.compile("<%=Pages\\.getString\\(\\\\\"([^\\\\]*)"); 
Matcher matcher = pattern.matcher(currentLine); 
while(matcher.find()) {
System.out.println(matcher.group(1).toString());                           
}}

I need to be able to use the Java RegEx API and regex to extract those header names.

Any help on this issue is greatly appreciated. Thanks!

EDIT:

Resolved this issue, thankfully. The tricky part was, after being given the right regex, it had to be taken into account that the String I was feeding to the regex was always going to have two " / " characters ( (/"MY_HEADER"/) ) that needed to be escaped in the pattern.

Here is what worked (thanks to the help ;-)):

Pattern pattern = Pattern.compile("<%=Pages\\.getString\\(\\\\\"([^\\\\\"]*)"); 

Upvotes: 2

Views: 1063

Answers (1)

Roddy of the Frozen Peas
Roddy of the Frozen Peas

Reputation: 15185

This should do the trick:

<%=Pages\\.getString\\(\\\\\"([^\\\\]*)

Yeah that's a scary number of back slashes. matcher.group(1) should return MY_HEADER. It starts at the \" and matches everything until the next \ (which I assume here will be at \")%>.)

Of course, if your target text contains a backslash (\), this will not work. But you didn't give an indication that you'd ever be looking for something like <%=Pages.getString(\"Fun!\Yay!\")%> -- where this regex would only return Fun! and ignore the rest.

EDIT

The reason your test case was failing is because you were using this test string:

String currentLine = "<%=Pages.getString(\"MY_HEADER\")%>"; 

This is the equivalent of reading it in from a file and seeing:

<%=Pages.getString("MY_HEADER")%> 

Note the lack of any \. You need to use this instead:

String sCurrentLine = "<%=Pages.getString(\\\"MY_HEADER\\\")%>"; 

Which is the equivalent of what you want.

This is test code that works:

String currentLine = "<%=Pages.getString(\\\"MY_HEADER\\\")%>"; 
Pattern pattern = Pattern.compile("<%=Pages\\.getString\\(\\\\\"([^\\\\]*)"); 
Matcher matcher = pattern.matcher(currentLine); 
while(matcher.find()) {
    System.out.println(matcher.group(1).toString()); 
}

Upvotes: 2

Related Questions