Reputation: 20167
Let’s say I am looping through a text file and come across the following two strings with random words and integer values
“foo 11 25”
“foo 38 15 976 24”
I write a regex pattern that would match both strings, for example:
((?:[a-z][a-z]+)\\s+\\d+\\s\\d+)
But, the problem is I don’t think this regex would allow me to get to all 4 integer values in the 2nd string.
Q1.) How can I create a single pattern that leaves these 3rd and 4th integers optional?
Q2.) How do I write the matcher code to only go after the 3rd and 4th values when they are found by the pattern?
Here is a template program to help anyone willing to offer a hand. Thanks.
public void foo(String fooFile) {
//Assume fooFile contains the two strings
//"foo 11 25";
//"foo 38 976 24";
Pattern p = Pattern.compile("((?:[a-z][a-z]+)\\s+\\d+\\s\\d+)", Pattern.CASE_INSENSITIVE);
BufferedReader br = new BufferedReader(new FileReader(fooFile));
String line;
while ((line = br.readLine()) != null) {
//Process the patterns
Matcher m1 = p.matcher(line);
if (m1.find()) {
int int1, int2, int3, int4;
//Need help to write the matcher code
}
}
}
Upvotes: 0
Views: 205
Reputation: 5395
If you want to retrieve every int value, you can use regex:
[a-z]+\s(\d+)\s(\d+)\s?(\d+)?\s?(\d+)?
and every int will be in groups from 1 to 4. Then you can use somethig like:
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args){
String[] strings = {"foo 11 25","foo 67 45 97",
"foo 38 15 976 24"};
for(String string : strings) {
ArrayList<Integer> numbers = new ArrayList<Integer>();
Matcher matcher = Pattern.compile("[a-z]+\\s(\\d+)\\s(\\d+)\\s?(\\d+)?\\s?(\\d+)?").matcher(string);
matcher.find();
for(int i = 0; i < 4; i++){
if(matcher.group(i+1) != null) {
numbers.add(Integer.valueOf(matcher.group(i + 1)));
}else{
System.out.println("group " + (i+1) + " is " + matcher.group(i+1));
}
}
System.out.println("Match from string: "+ "\""+ string + "\"" + " : " + numbers.toString());
}
}
}
with output:
group 3 is null
group 4 is null
Match from string: "foo 11 25" : [11, 25]
group 4 is null
Match from string: "foo 67 45 97" : [67, 45, 97]
Match from string: "foo 38 15 976 24" : [38, 15, 976, 24]
Another way would be to get all int in one group with:
[a-z]+\s((?:\d+\s?)+)
and split matcher.group(1)
with space, you will get String[]
with values. Implementation in Java:
public class Test {
public static void main(String[] args){
String[] strings = {"foo 11 25","foo 67 45 97",
"foo 38 15 976 24"};
for(String string : strings) {
ArrayList<Integer> numbers = new ArrayList<Integer>();
Matcher matcher = Pattern.compile("[a-z]+\\s((?:\\d+\\s?)+)").matcher(string);
matcher.find();
String[] nums = matcher.group(1).split("\\s");
for(String num : nums){
numbers.add(Integer.valueOf(num));
}
System.out.println("Match from string: "+ "\""+ string + "\"" + " : " + numbers.toString());
}
}
}
with output:
Match from string: "foo 11 25" : [11, 25]
Match from string: "foo 67 45 97" : [67, 45, 97]
Match from string: "foo 38 15 976 24" : [38, 15, 976, 24]
Upvotes: 1
Reputation:
The current regex pattern you are using requires the text \s\d\s\d
at the end. If you want it to allow for any number of numbers each preceded by whitespace, you would use (\s+\d+)+
.
So the full regex would be ((?:[a-z][a-z]+)(\s+\d+)+)
Upvotes: 0