Dr. Watson
Dr. Watson

Reputation: 3820

Java regex from Perl-type regex

I'm trying to extract hours, minutes, seconds, and nanoseconds from a string time stamp in a log file.Here is the input string I am testing with:

 SOME_TEXT,+09:30:01.040910105,SOME_TEXT,SOME_TEXT,SOME_TEXT

In Perl/Python, I would use the following regex to group the fields I am interested in:

 (\d\d)\:(\d\d)\:(\d\d)\.(\d{9})

You can verify that the regex works with the test string at http://regexpal.com if you're curious.

So I tried to write a simple Java program that can extract the fields:

import java.util.regex.*;

public class Driver
{
  static public void main(String[] args)
  {
    String t = new String("SOME_TEXT,+09:30:01.040910105,SOME_TEXT,SOME_TEXT,SOME_TEXT");
    Pattern regex = Pattern.compile("(\\d\\d):(\\d\\d):(\\d\\d)\\.(\\d{9})");
    Matcher matches = regex.matcher(t);
    for (int i=1; i<matches.groupCount(); ++i)
    {
      System.out.println(matches.group(i));
    }
  }
}

My regex did not translate correctly, however. The following exception shows that it did not find any matches:

 Exception in thread "main" java.lang.IllegalStateException: No match found
   at java.util.regex.Matcher.group(Matcher.java:485)
   at Driver.main(Driver.java:12)

How would I properly translate the regex from Perl/Python style to Java?

Upvotes: 3

Views: 1203

Answers (4)

TraderJoeChicago
TraderJoeChicago

Reputation: 6315

Java breaks the perl-style, introducing complexity where it need not be. If you want to do regular expressions in Java the right way, take a look on MentaRegex. Below some examples:

The method matches returns a boolean saying whether we have a regex match or not.

matches("Sergio Oliveira Jr.", "/oliveira/i" ) => true

The method match returns an array with the groups matched. So it not only tells you whether you have a match or not but it also returns the groups matched in case you have a match.

match("aa11bb22", "/(\\d+)/g" ) => ["11", "22"]

The method sub allows you perform substitutions with regex.

sub("aa11bb22", "s/\\d+/00/g" ) => "aa00bb00"

Support global and case-insensitive regex.

match("aa11bb22", "/(\\d+)/" ) => ["11"]
match("aa11bb22", "/(\\d+)/g" ) => ["11", "22"]
matches("Sergio Oliveira Jr.", "/oliveira/" ) => false
matches("Sergio Oliveira Jr.", "/oliveira/i" ) => true

Allows you to change the escape character in case you don't like to see so many '\'.

match("aa11bb22", "/(\\d+)/g" ) => ["11", "22"]
match("aa11bb22", "/(#d+)/g", '#' ) => ["11", "22"]

Upvotes: 2

Kent
Kent

Reputation: 195049

oh,no! I copied your codes and wrapped with if (matches.find()) { ...} then worked. you need this.

and nanoseconds was missing. you should do this change:

for (int i = 1; i <= matches.groupCount(); ++i)
-------------------^

Upvotes: 0

Matthew Farwell
Matthew Farwell

Reputation: 61705

By default java regexs match against the whole string, you have to add .* to the beginning and end:

Pattern regex = Pattern.compile(".*(\\d\\d):(\\d\\d):(\\d\\d)\\.(\\d{9}).*");

and that should work, with the other corrections to your for loop as necessary :-)

Upvotes: 0

NPE
NPE

Reputation: 500327

The regex itself is fine. There are, however, two problems with the code:

  1. you need to call Matcher.find();
  2. you need to fix the for loop (it should use <= instead of <).

Here is the corrected version:

public class Driver
{
  static public void main(String[] args)
  {
    String t = new String("SOME_TEXT,+09:30:01.040910105,SOME_TEXT,SOME_TEXT,SOME_TEXT");
    Pattern regex = Pattern.compile("(\\d\\d):(\\d\\d):(\\d\\d)\\.(\\d{9})");
    Matcher matcher = regex.matcher(t);
    while (matcher.find()) {
        for (int i=1; i<=matcher.groupCount(); ++i)
        {
          System.out.println(matcher.group(i));
        }
    }
  }
}

This prints out:

09
30
01
040910105

Upvotes: 3

Related Questions