Erel Segal-Halevi
Erel Segal-Halevi

Reputation: 36745

sorting a java list using a scores table

I have a list of N strings, and a parallel list of N scores. I need to sort the strings using the scores in the table. How do I do that?

My current solution is to use an auxiliary list of indices, like this:

public static List<String> sortByScores(List<String> strings, final List<Float> scores) {
    List<Integer> indices = new ArrayList<Integer>(strings.size());
    for (int i=0; i<strings.size(); i++) 
        indices.add(i);
    Collections.sort(indices, new Comparator<Integer>() {
        @Override public int compare(Integer arg0, Integer arg1) {  // sort in descending order
            return -scores.get(arg0).compareTo(scores.get(arg1));
        }
    });
    List<String> sortedStrings = new ArrayList<String>(strings.size());
    for (int i=0; i<indices.size(); ++i)
        sortedStrings.add(strings.get(indices.get(i)));
    return sortedStrings;
}

It works, but seems inefficient.

Is there a better solution?

Upvotes: 1

Views: 1091

Answers (4)

Ray Toal
Ray Toal

Reputation: 88378

Pseudocode

// Precondition: length of each list is the same, call it N
let m = new TreeMap<Integer, List<String>>()
for i in 0 .. N-1
    if m.containsKey(scores[i])
        m.get(scores[i]).append(strings[i])
    else
        m.put(scores[i], a new list containing the sole element strings[i])
    end if
end if

for each entry (k, v) in m
    output all the strings in v
end

No need to sort or define comparables or anything, because the treemap is already sorted on the scores!

Upvotes: 3

Erel Segal-Halevi
Erel Segal-Halevi

Reputation: 36745

OK, I tested all methods you suggested using a random collection of strings:

public static void testSortByScores(int count) {
    int length = 4;
    // Create a random array and random scores:
    List<String> strings = new ArrayList<String>(count);
    List<Float> scores = new ArrayList<Float>(count);
    RandomString randomString = new RandomString(length);
    String letters = "abcdefghijklmnopqrstuvwxyz";
    for (int iString=0; iString<count; ++iString) {
        StringBuffer randomStringBuffer = new StringBuffer(length);
        int score = 0;
        for (int iChar=0; iChar<length; ++iChar) {
            int index = (int)(Math.random()*letters.length());
            char c = letters.charAt(index);
            randomStringBuffer.append(c);
            score += index;
        }
        strings.add(randomStringBuffer.toString());
        scores.add((float)score);
    }


    long start = System.currentTimeMillis();
    strings = sortByScoresUsingIndices(strings,scores);
    //strings = sortByScoresUsingClass(strings,scores);
    //strings = sortByScoresUsingTree(strings,scores);
    System.out.println("sorting "+count+" took "+(System.currentTimeMillis()-start)+" ms.");
}

and here are the results:

My method - sortByScoresUsingIndices - is probably the worse:

sorting 10000 took 52 ms.
sorting 30000 took 140 ms.
sorting 100000 took 396 ms.
sorting 300000 took 382 ms.
sorting 1000000 took 1122 ms.
sorting 3000000 took 5096 ms.

Then comes the method using ScoreClass, which I implemented like this:

public static List<String> sortByScoresUsingClass(List<String> strings, final List<Float> scores) {
    List<ScoreClass> list = new ArrayList<ScoreClass>(strings.size());
    for (int i=0; i<strings.size(); i++) {
        ScoreClass sc = new ScoreClass(strings.get(i),scores.get(i));
        list.add(sc);
    }
    Collections.sort(list);
    List<String> sortedStrings = new ArrayList<String>(strings.size());
    for (ScoreClass item: list)
        sortedStrings.add(item.myString);
    return sortedStrings;
}


sorting 10000 took 60 ms.
sorting 30000 took 121 ms.
sorting 100000 took 40 ms.
sorting 300000 took 280 ms.
sorting 1000000 took 648 ms.
sorting 3000000 took 3254 ms.

and the best one is the method using the TreeMap, but I had to change it and use a list, because there might be more than one string with the same score:

public static List<String> sortByScoresUsingTree(List<String> strings, final List<Float> scores) {
    TreeMap<Float,List<String>> treeMap = new TreeMap<Float,List<String>>();
    for (int i=0; i<strings.size(); i++) {
        Float key = -scores.get(i);
        if (treeMap.get(key)==null)
            treeMap.put(key, new LinkedList<String>());
        treeMap.get(key).add(strings.get(i));
    }
    List<String> sortedStrings = new ArrayList<String>(strings.size());
    for (List<String> set: treeMap.values()) {
        sortedStrings.addAll(set);
    }
    return sortedStrings;
}

And the results are:

sorting 10000 took 29 ms.
sorting 30000 took 16 ms.
sorting 100000 took 25 ms.
sorting 300000 took 229 ms.
sorting 1000000 took 374 ms.
sorting 3000000 took 2723 ms.

Upvotes: 0

sternr
sternr

Reputation: 6506

I'd create a new POJO containing a String and its Score, and have it implement Comparable

Upvotes: 2

Jesus Ramos
Jesus Ramos

Reputation: 23268

Put the string and the score into one class and implement the Comparable interface this way you sort on the score but you can access the string once it's sorted (seems the most efficient to me).

Example:

public class ScoreClass implements Comparable<ScoreClass>
{
    String myString;
    float score;

    public int compareTo(ScoreClass c)
    {
        return Float.compare(this.score, c.score);
    }
}

This is brain compiled code so let me know if something is wrong.

Upvotes: 2

Related Questions