Reputation: 55
I am trying to extract numbers of length 3 to 5 from a long string of text. Let me explain
Say there is a string like this 123456
and I want to extract all the number that are between length 3 and 5 output would be
123
234
345
456
1234
2345
3456
12345
23456
I can run multiple regex that individually find the lengths but there might be a better way to do that than what I am doing.
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTestHarness {
public static void main(String[] args) throws IOException {
String data = "123456";
Matcher m = Pattern.compile("\\d{3}").matcher(data);
// m = Pattern.compile("\\d{4}").matcher(data);
// m = Pattern.compile("\\d{5}").matcher(data);
int position = 0;
while (m.find(position++)) {
System.out.println(m.group());
}
}
}
Premature Optimization Idea - I can match everything to 5 and then run smaller length matchers on the result of those. That way I cut down on reading the data over and over which in my case is an external source.
Upvotes: 1
Views: 981
Reputation:
You can do this with a single regex. Just globally find.
Print capture groups 1,2,3 if thier lengths are greater than 0
# "(?=(\\d{3}))(?=(\\d{4}))?(?=(\\d{5}))?"
(?=
( \d{3} ) # (1)
)
(?=
( \d{4} ) # (2)
)?
(?=
( \d{5} ) # (3)
)?
Perl test case
while ( '123456' =~ /(?=(\d{3}))(?=(\d{4}))?(?=(\d{5}))?/g )
{
print "$1\n";
if ( length ($2) ) {
print "$2\n";
}
if ( length ($3) ) {
print "$3\n";
}
}
Output >>
123
1234
12345
234
2345
23456
345
3456
456
Upvotes: 2
Reputation: 25287
This looks a lot harder with regex. If you don't need to use it, loop through each one of the starting positions and extract the numbers:
// If this string is just plain numbers, skip the dataArray and the
// for (String s: dataArray) and replace the s's in the loops with data's
String data = "123456 some other datas 654321";
String[] dataArray = data.split("\\D+");
for (String s: dataArray){
for (int length = 3; length <= 5; length++){
for (int index = 0; index <= s.length() - length; index++) {
int maxIndex = index + length;
System.out.println(s.substring(index, maxIndex));
}
}
}
Output:
123
234
345
456
1234
2345
3456
12345
23456
654
543
432
321
6543
5432
4321
65432
54321
Upvotes: 2
Reputation: 39355
Try this solution for 3-5 match only. I am using lookahead to find the overlapping matching from the string. I have used three regex here practically.
String text = "123456";
Pattern pattern = Pattern.compile("(?=(\\d\\d\\d))(?=(\\d\\d\\d\\d?))(?=(\\d\\d\\d\\d?\\d?))");
Matcher m = pattern.matcher(text);
// taking out all the captures from the Matcher
List<String> list = new ArrayList<String>();
while (m.find()) {
list.add(m.group(1));
list.add(m.group(2));
list.add(m.group(3));
}
// making the list unique using HashSet
list = new ArrayList<String>(new HashSet<String>(list));
// printing
for(String s : list){
System.out.println(s);
}
// output is not sorted, if you want you can sort the List<>
Upvotes: 0