Reputation: 187
I'm trying to sort a string by the number of occurrences of each of its characters, with the most frequent at the beginning and rarest at the end. After sorting, I would need to remove all the character repeats. Because examples are always clearer, the program should do the following:
String str = "aebbaaahhhhhhaabbbccdfffeegh";
String output = sortByCharacterOccurrencesAndTrim(str);
In this case, the 'sortByCharacterOccurrencesAndTrim' method should return:
String output = "habefcdg"
In the case where 2 characters have the same occurrence, their order in the returned string doesn't matter. So "habefcdg" can also equal "habfecgd", because both 'f' and 'e' occur 3 times and both 'd' and 'g' occur once.
"habefcdg" would effectively be the same as "habfecgd"
Note: I would like to point out that performance matters in this case, so I would prefer the most efficient method possible. I say this because the string length can range from 1 to the max length (which I think is the same as Integer.MAX_VALUE, but not sure), so I want to minimize any potential bottlenecks.
Upvotes: 9
Views: 3845
Reputation: 235994
Just for fun (and I'm not claiming this is the most efficient solution): how about some Java 8 lambdas + parallel streams?
public String sortByCharacterOccurrencesAndTrim(String str) {
// build a frequency map, for each code point store its count
Map<Integer, Long> frequencies =
str.codePoints()
.parallel()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
// sort by descending frequency and collect code points into array
int[] output =
frequencies.entrySet()
.parallelStream()
.sorted(Map.Entry.<Integer, Long>comparingByValue().reversed())
.mapToInt(Map.Entry::getKey)
.toArray();
// create output string from code point array
return new String(output, 0, output.length);
}
If you want a super-efficient solution you can rewrite the above algorithm using explicit loops, but that's a lot of code and it's late for me :) . The idea, however, will be the same: build a char frequency map, sort by frequency using descending order and build a string with the chars.
Upvotes: 3
Reputation: 159086
Note: This is not an answer, but simply showing performance test code of answers by Jim Mischel and Óscar López (with parallel streams in response to comment by OP).
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import java.util.function.Function;
import java.util.stream.Collectors;
public class Test {
public static void main(String[] args) {
long start = System.currentTimeMillis();
String s = buildString();
System.out.println("buildString: " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
String result1 = testUsingArray(s);
System.out.println("testUsingArray: " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
String result2 = testUsingMap(s);
System.out.println("testUsingMap: " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
String result3 = testUsingStream(s);
System.out.println("testUsingStream: " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
String result4 = testUsingParallelStream(s);
System.out.println("testUsingParallelStream: " + (System.currentTimeMillis() - start) + "ms");
System.out.println(result1);
System.out.println(result2);
System.out.println(result3);
System.out.println(result4);
}
private static String buildString() {
Random rnd = new Random();
char[] buf = new char[100_000_000];
for (int i = 0; i < buf.length; i++)
buf[i] = (char)(rnd.nextInt(127 - 33) + 33);
return new String(buf);
}
private static String testUsingArray(String s) {
int[] count = new int[65536];
for (int i = 0; i < s.length(); i++)
count[s.charAt(i)]++;
List<CharCount> list = new ArrayList<>();
for (int i = 0; i < 65536; i++)
if (count[i] != 0)
list.add(new CharCount((char)i, count[i]));
Collections.sort(list);
char[] buf = new char[list.size()];
for (int i = 0; i < buf.length; i++)
buf[i] = list.get(i).ch;
return new String(buf);
}
private static String testUsingMap(String s) {
Map<Character, CharCount> map = new HashMap<>();
for (int i = 0; i < s.length(); i++)
map.computeIfAbsent(s.charAt(i), CharCount::new).count++;
List<CharCount> list = new ArrayList<>(map.values());
Collections.sort(list);
char[] buf = new char[list.size()];
for (int i = 0; i < buf.length; i++)
buf[i] = list.get(i).ch;
return new String(buf);
}
private static String testUsingStream(String s) {
int[] output = s.codePoints()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.sorted(Map.Entry.<Integer, Long>comparingByValue().reversed())
.mapToInt(Map.Entry::getKey)
.toArray();
return new String(output, 0, output.length);
}
private static String testUsingParallelStream(String s) {
int[] output = s.codePoints()
.parallel()
.boxed()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.parallelStream()
.sorted(Map.Entry.<Integer, Long>comparingByValue().reversed())
.mapToInt(Map.Entry::getKey)
.toArray();
return new String(output, 0, output.length);
}
}
class CharCount implements Comparable<CharCount> {
final char ch;
int count;
CharCount(char ch) {
this.ch = ch;
}
CharCount(char ch, int count) {
this.ch = ch;
this.count = count;
}
@Override
public int compareTo(CharCount that) {
return Integer.compare(that.count, this.count); // descending
}
}
Sample Output
buildString: 974ms
testUsingArray: 48ms
testUsingMap: 216ms
testUsingStream: 1279ms
testUsingParallelStream: 442ms
UOMP<FV{KHt`(-q6;Gl'R9nxy+.Y[=2a7^45v?E@e,>|AD_\ILpJ}8sow"Z&bCmNW1$!Sd0c]~g3BjX#fz:Q*Tkui%/r)h
UOMP<FV{KHt`(-q6;Gl'R9nxy+.Y[=2a7^45v?E@e,>|AD_\ILpJ}8sow"Z&bCmNW1$!Sd0c]~g3BjX#fz:Q*Tkui%/r)h
UOMP<FV{KHt`(-q6;Gl'R9nxy+.Y[=2a7^45v?E@e,>|AD_\ILpJ}8sow"Z&bCmNW1$!Sd0c]~g3BjX#fz:Q*Tkui%/r)h
UOMP<FV{KHt`(-q6;Gl'R9nxy+.Y[=2a7^45v?E@e,>|AD_\ILpJ}8sow"Z&bCmNW1$!Sd0c]~g3BjX#fz:Q*Tkui%/r)h
Upvotes: 5
Reputation: 46
I don't know anything about streams and lambdas, but I would do something like this:
Map<Character, Integer> map;
for (int i=0;i<str.length,i++){
char c = s.charAt(i);
switch(c){
case: map.containsKey(c):
int temp = map.get(c)++;
map.put(c, tmp);
break;
case: !map.containsKey(c):
map.put(c, tmp);
break;
}
}
to count the occurrences. Then it's just a matter of sorting it from highest to lowest afterwards.
Upvotes: -1
Reputation: 133975
"A map and several while loops" is certainly the easiest way to go, and probably would be very fast. The idea is:
for each character
increment its count in the map
Sort the map in descending order
Output the map keys in that order
But 100,000,000 map lookups could get pretty expensive. You could potentially speed it up by creating an array of 65,536 integer counts (or 128 characters if it's ASCII). Then:
for each character
array[(int)ch] += 1
Then, you go through that array and create a map of the characters with non-zero counts:
for i = 0 to 65535
if array[i] > 0
map.add((char)i, array[i])
Then sort the map in descending order and output the characters in that order.
That could perform quite a bit faster, simply because indexing into an array 100,000,000 times is likely a lot faster than doing 100,000,000 map lookups.
Upvotes: 6