Reputation: 16651
I'm looking for a regex that will extract all possible amounts from a string, assuming that amounts always contain 2 decimals, and accepting either .
or ,
liberally as separators. For example, for the following string, I would like to find the amounts below:
1.234,567.89
1.23
1.234,56
234,56
34,56
4,56
1.234,567.89
234,567.89
34,567.89
4,567.89
567.89
67.89
7.89
Is this achievable with regex?
My current regex is ?\\d{1,3}([\\.,]\\d{3})*([\\.,]\\d{2})
but that obviously doesn't work because it only returns 1 match.
Upvotes: 4
Views: 98
Reputation: 22837
So, initially, I thought this cannot be done with regex. I was wrong in assuming so, but this is only possible by removing duplicates and empty strings/nulls.
(?=(\d+[.,]\d{2}))(?=((?:\d+[.,]){2,}?\d{2})?)(?=((?:\d+[.,])+\d{2}))
The pattern above contains 3 positive lookaheads. They are as follows:
(?=(\d+[.,]\d{2}))
Ensure the following matches
(\d+[.,]\d{2})
Capture the following into capture group 1. This captures shorter variants like 1.23
.
\d+
Match one or more digits[.,]
Match either .
or ,
literally\d{2}
Match exactly 2 digits(?=((?:\d+[.,]){2,}?\d{2})?)
Ensure the following matches
((?:\d+[.,]){2,}?\d{2})?
Optionally capture the following into capture group 2. This captures the in-between numbers like 1.234,56
where a smaller version and longer version exists at that same location (1.23
and 1.234,567.89
respectively). If longer number versions are possible, you may need to add more positive lookaheads identical to this one and changing {2,}
in the following section to {3,}
, {4,}
, etc. and add those groups to the while
loop. If no in-between number exists, it'll simply capture the same as the third lookahead (the duplicate is removed in the code).
(?:\d+[.,]){2,}?
Match the following 2 or more times, but as few as possible.
\d+
Match one or more digits[.,]
Match either .
or ,
literally\d{2}
Matches exactly 2 digits(?=((?:\d+[.,])+\d{2}))
Ensure the following matches
((?:\d+[.,])+\d{2})
Captures the following into capture group 3. This captures longer variants like 1.234,567.89
.
(?:\d+[.,])+
Match the following one or more times
\d+
Match one or more digits[.,]
Match either .
or ,
literally\d{2}
Match exactly 2 digitsThe code below simply iterates over all the matches and extracts the group values adding them to a List
. null
values are then removed from the list using the method found here. The list is then turned into a Set
to remove duplicates according to this answer and added back to the (now empty) list.
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main{
public static void main(String[] args) {
String s = "1.234,567.89";
Pattern p = Pattern.compile("(?=(\\d+[.,]\\d{2}))(?=((?:\\d+[.,]){2,}?\\d{2})?)(?=((?:\\d+[.,])+\\d{2}))");
Matcher m = p.matcher(s);
List<String> al = new ArrayList<>();
Set<String> hs = new HashSet<>();
while(m.find()) {
al.add(m.group(1));
al.add(m.group(2));
al.add(m.group(3));
}
al.removeAll(Collections.singleton(null));
hs.addAll(al);
al.clear();
al.addAll(hs);
System.out.println(al);
}
}
The numbers in the result below coincide with the OP's list of values (albeit the output looks different, you can confirm by crosschecking both sets of values).
[34,567.89, 4,56, 7.89, 234,56, 4,567.89, 1.234,56, 1.23, 1.234,567.89, 34,56, 567.89, 67.89, 234,567.89]
Upvotes: 5