Reputation: 31
I have a comparator that sorts an array of strings that contain letters and numbers, but can't seem to identify the regular expression that sorts them in the manner I am looking for.
I have used this question as a reference for my comparator.
array={string-a01,string-a20,string-a100,string-b01,string-b20,string-b100,string-c01,string-c20,string-c100 etc.}
Collections.sort(array, new Comparator<String>(){
public int compare(String o1, String o2) {
return extractInt(o1) - extractInt(o2);
}
int extractInt(String s) {
String num = s.replaceAll("\\D", "");
return num.isEmpty() ? 0 : Integer.parseInt(num);
}
});
for (String element : array) {
System.out.println(element);
}
Before Introducing the comparator the output was:
string-a01, string-a100, string-a20, string-b01, string-b100, string-b20, string-c01, string-c20, string-c100
The output that this code produces is:
string-a01, string-b01, string-c01 string-a20, string-b20, string-c20 string-a100, string-b100, string-c100
The output I would like it to produce is:
string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100
EDIT: Edited for clarification. Array has been changed and output before the comparator was added.
Upvotes: 1
Views: 2233
Reputation: 33875
Assuming that the string
part is actually something else than just "string"
. You can extract the letter part of the ending, and the digit part, and compare those using a composite Comparator:
String[] array = { "string-a20", "string-a01", "string-b01",
"string-b20", "string-c01", "string-c20",
"string-a100", "string-b100", "string-c100" };
Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");
List<String> result = Arrays.stream(array)
.map(p::matcher)
.filter(Matcher::find)
.sorted(Comparator.comparing((Matcher m) -> m.group(1)) // Compare the letter part
.thenComparingInt(m -> Integer.parseInt(m.group(2)))) // Compare the number part
.map(m -> m.group(0)) // Map back to String
.collect(Collectors.toList());
System.out.println(result);
Output:
[string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100]
Legacy version (With the downside of having to recreate Matcher
s):
Arrays.sort(array, new Comparator<String>() {
Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");
@Override
public int compare(String o1, String o2) {
Matcher m1 = p.matcher(o1);
Matcher m2 = p.matcher(o2);
if(!(m1.find() && m2.find()))
return 0; // Or throw a format exception
int comparison = m1.group(1).compareTo(m2.group(1));
return comparison != 0
? comparison
: Integer.compare(Integer.parseInt(m1.group(2)), Integer.parseInt(m2.group(2)));
}
});
Upvotes: 3
Reputation: 140484
It sounds like you want to order the strings on the "leading strings", i.e. everything up to the digits; if the leading strings are equal, then compare on the subsequent digits.
To split the string into its "string" and "integer" parts, you can first the "first trailing digit", i.e. the position of the first character in the string where there are no non-digits between it and the end of the string:
int firstTrailingDigit(String s) {
int i = s.length();
while (i > 0 && Character.isDigit(s.charAt(i - 1))) {
--i;
}
return i;
}
You can then use this in your comparator:
public int compare(String a, String b) {
int ftdA = firstTrailingDigit(a);
int ftdB = firstTrailingDigit(b);
// Get the leading strings, and compare.
String sA = a.substring(0, ftdA);
String sB = b.substring(0, ftdB);
int compareStrings = sA.compareTo(sB);
if (compareStrings != 0) {
// If they're not equal, return the result of the comparison.
return compareStrings;
}
// Get the trailing numbers from the strings, and compare.
int iA = Integer.parseInt(a.substring(ftdA));
int iB = Integer.parseInt(b.substring(ftdB));
return Integer.compare(iA, iB);
}
Input:
String[] array = {"string-a01","string-a20","string-a100","string-b01","string-b20","string-b100","string-c01","string-c20","string-c100"};
Output:
[string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100]
Upvotes: 1
Reputation: 48434
You are removing the alphabetical characters in your extractInt
method, so you won't be able to use them in the comparison.
You should just sort them with no Comparator
, which will sort them using the default, lexicographical sorting algorithm (java.lang.String
implements Comparable<String>
).
Example
// test array
String[] s = {"string-a01","string-a01","string-b01","string-b02","string-c02","string-c02"};
// sorting with null Comparator, will sort if the type implements Comparable -
// which String does
Arrays.sort(s);
// printing in human-readable form
System.out.println(
Arrays.toString(s)
);
Output
[string-a01, string-a01, string-b01, string-b02, string-c02, string-c02]
Notes
If you want to remove duplicates (which might be your intent from the question - not clear), add the array elements to a TreeSet
instead:
Set<String> deduplicated = new TreeSet<>(Arrays.asList(s));
If your sorting algorithm must act so that 2
comes before 12
, then you need to extract the integer value without removing it from the elements, and compare it only when the rest of the String
s are equal.
Upvotes: 1