javacavaj
javacavaj

Reputation: 2971

Parsing Numeric Values with Java's Regular Expression Classes

In Java, I'm attempting to parse data from an ASCII output file. A sample of the data looks is show below. The values are formatted precision 5 scale 3 and no space exists between the values.

80.234 <- 1 value
71.01663.129 <- 2 values ...
67.09159.25353.997
56.02759.77859.25057.749
55.86558.46958.64861.72855.969

What regular expression pattern can I use to match the number values and split them into groups? The pattern (\d+.\d{1,3}) matches a single value. However, with the number of groups for the line specified it does not give the expected answer. For example, I expected the following to find 10 groups.

String testPattern = "68.65761.25659.01057.67657.14857.06457.41658.77861.16268.641";

// create a pattern to match the output
Pattern p = Pattern.compile("(\\d+\\.\\d{1,3}){10}");

Matcher m = p.matcher(testPattern);

if (m.find())
{
    String group = m.group();
}

Upvotes: 0

Views: 1614

Answers (4)

Alan Moore
Alan Moore

Reputation: 75242

You're expecting it to somehow break out the individual numbers because that's how you matched them, but it doesn't work that way. What your regex does is capture one number at a time and place it into group #1. Ten times it does this, each time overwriting the contents of group #1 with the new value. When it's done, group() returns the whole string as you discovered, while group(1) would return only the tenth number, 68.641.

This is a common error, probably due to Java's lack of a built-in "find all matches" mechanism. .NET has its Matches() methods, PHP has preg_match_all(), Python has re.findall(), Perl and JavaScript have the /g modifier... every major flavor has a mechanism to return either an array of all matches or an iterator over the matches, or both. But in Java you're expected to call find() in a while loop, as @KennyTM demonstrated.

It's an annoying omission, but not really a surprising one, for Java. Its effect is to force us to write more verbose, less idiomatic code, which has been a Java hallmark from the very beginning. But if you really want to reduce this task to a one-liner, there's the old "split on a lookaround" trick:

String[] result = source.split("(?=\\B\\d{2}\\.\\d{3})");

...or:

String[] result = source.split("(?<=\\G\\d{2}\\.\\d{3})");

Upvotes: 2

ColinD
ColinD

Reputation: 110054

Using Guava, a fixed-length Splitter would work well here.

Iterable<String> numbers = Splitter.fixedLength(6).split(testPattern);

If you were to create a Function<String, Double> (called, say, Numbers.doubleParser()), you could even convert the data to numbers easily. (Obviously you could use BigDecimal or whatever rather than Double depending on your needs.)

private static final Splitter SPLITTER = Splitter.fixedLength(6);

...

public void someMethod(String stringToParse) {
  for(Double value : Iterables.transform(SPLITTER.split(stringToParse),
                                         Numbers.doubleParser())) {
    ...
  }
}

Upvotes: 1

kennytm
kennytm

Reputation: 523474

There is only 1 group with your regex. Use a while loop to enumerate all of them. (See http://www.ideone.com/FNRsz):

String testPattern = "68.65761.25659.01057.67657.14857.06457.41658.77861.16268.641";
Pattern p = Pattern.compile("\\d+\\.\\d{1,3}");
Matcher m = p.matcher(testPattern);

while(m.find())   // <---
   System.out.println(m.group());

Upvotes: 2

Brian Knoblauch
Brian Knoblauch

Reputation: 21389

If they're all identically formatted, perhaps it would be easier to just read in 6 characters as a string, then use Double.parseDouble to parse that from string to Double?

Upvotes: 4

Related Questions