Jon
Jon

Reputation: 31

Sorting Strings that contain integers with comparator

I have a comparator that sorts an array of strings that contain letters and numbers, but can't seem to identify the regular expression that sorts them in the manner I am looking for.

I have used this question as a reference for my comparator.

array={string-a01,string-a20,string-a100,string-b01,string-b20,string-b100,string-c01,string-c20,string-c100 etc.}

Collections.sort(array, new Comparator<String>(){       
    public int compare(String o1, String o2) {
        return extractInt(o1) - extractInt(o2);
    }

    int extractInt(String s) {
        String num = s.replaceAll("\\D", "");
        return num.isEmpty() ? 0 : Integer.parseInt(num);
    }
});
        
for (String element : array) {
    System.out.println(element);
}

Before Introducing the comparator the output was:
string-a01, string-a100, string-a20, string-b01, string-b100, string-b20, string-c01, string-c20, string-c100

The output that this code produces is:
string-a01, string-b01, string-c01 string-a20, string-b20, string-c20 string-a100, string-b100, string-c100

The output I would like it to produce is:
string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100


EDIT: Edited for clarification. Array has been changed and output before the comparator was added.

Upvotes: 1

Views: 2233

Answers (3)

Jorn Vernee
Jorn Vernee

Reputation: 33875

Assuming that the string part is actually something else than just "string". You can extract the letter part of the ending, and the digit part, and compare those using a composite Comparator:

String[] array = { "string-a20", "string-a01", "string-b01",
    "string-b20", "string-c01", "string-c20",
    "string-a100", "string-b100", "string-c100" };

Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");

List<String> result = Arrays.stream(array)
    .map(p::matcher)
    .filter(Matcher::find)
    .sorted(Comparator.comparing((Matcher m) -> m.group(1)) // Compare the letter part
        .thenComparingInt(m -> Integer.parseInt(m.group(2)))) // Compare the number part
    .map(m -> m.group(0)) // Map back to String
    .collect(Collectors.toList());

System.out.println(result);

Output:

[string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100]

Legacy version (With the downside of having to recreate Matchers):

Arrays.sort(array, new Comparator<String>() {

    Pattern p = Pattern.compile("^.*?-([A-Za-z]+)(\\d+)$");

    @Override
    public int compare(String o1, String o2) {
        Matcher m1 = p.matcher(o1);
        Matcher m2 = p.matcher(o2);

        if(!(m1.find() && m2.find()))
            return 0; // Or throw a format exception

        int comparison = m1.group(1).compareTo(m2.group(1));
        return comparison != 0
            ? comparison 
            : Integer.compare(Integer.parseInt(m1.group(2)), Integer.parseInt(m2.group(2)));
    }

});

Upvotes: 3

Andy Turner
Andy Turner

Reputation: 140484

It sounds like you want to order the strings on the "leading strings", i.e. everything up to the digits; if the leading strings are equal, then compare on the subsequent digits.

To split the string into its "string" and "integer" parts, you can first the "first trailing digit", i.e. the position of the first character in the string where there are no non-digits between it and the end of the string:

int firstTrailingDigit(String s) {
  int i = s.length();
  while (i > 0 && Character.isDigit(s.charAt(i - 1))) {
    --i;
  }
  return i;
}

You can then use this in your comparator:

public int compare(String a, String b) {
  int ftdA = firstTrailingDigit(a);
  int ftdB = firstTrailingDigit(b);

  // Get the leading strings, and compare.
  String sA = a.substring(0, ftdA);
  String sB = b.substring(0, ftdB);
  int compareStrings = sA.compareTo(sB);
  if (compareStrings != 0) {
    // If they're not equal, return the result of the comparison.
    return compareStrings;
  }

  // Get the trailing numbers from the strings, and compare.
  int iA = Integer.parseInt(a.substring(ftdA));
  int iB = Integer.parseInt(b.substring(ftdB));
  return Integer.compare(iA, iB);
}

Ideone demo

Input:

String[] array = {"string-a01","string-a20","string-a100","string-b01","string-b20","string-b100","string-c01","string-c20","string-c100"};

Output:

[string-a01, string-a20, string-a100, string-b01, string-b20, string-b100, string-c01, string-c20, string-c100]

Upvotes: 1

Mena
Mena

Reputation: 48434

You are removing the alphabetical characters in your extractInt method, so you won't be able to use them in the comparison.

You should just sort them with no Comparator, which will sort them using the default, lexicographical sorting algorithm (java.lang.String implements Comparable<String>).

Example

// test array
String[] s = {"string-a01","string-a01","string-b01","string-b02","string-c02","string-c02"};

// sorting with null Comparator, will sort if the type implements Comparable - 
// which String does
Arrays.sort(s);

// printing in human-readable form
System.out.println(
    Arrays.toString(s)
);

Output

[string-a01, string-a01, string-b01, string-b02, string-c02, string-c02]

Notes

  • If you want to remove duplicates (which might be your intent from the question - not clear), add the array elements to a TreeSet instead:

    Set<String> deduplicated = new TreeSet<>(Arrays.asList(s));
    
  • If your sorting algorithm must act so that 2 comes before 12, then you need to extract the integer value without removing it from the elements, and compare it only when the rest of the Strings are equal.

Upvotes: 1

Related Questions