Reputation: 89
I'm trying to apply the same replacement instructions several thousand times to different input strings with as little overhead as possible. I need to consider two things for this:
With that in mind, this is the code I came up with:
public class Replacements {
private String[] search;
private String[] replace;
Replacements(String[] s, String[] r)
{
if (s.length!=r.length) throw new IllegalArgumentException();
Map<String,String> map = new HashMap<String,String>();
for (int i=0;i<s.length;i++)
{
map.put(s[i], r[i]);
}
List<String> sortedKeys = new ArrayList(map.keySet());
Collections.sort(sortedKeys, new StringLengthComparator());
this.search = sortedKeys.toArray(new String[0]);
Stack<String> r2 = new Stack<>();
sortedKeys.stream().forEach((i) -> {
r2.push(map.get(i));
});
this.replace = r2.toArray(new String[0]);
}
public String replace(String input)
{
return replace(input,0);
}
private String replace(String input,int i)
{
String out = "";
List<String> parts = Arrays.asList(input.split(this.search[i],-1));
for (Iterator it = parts.iterator(); it.hasNext();)
{
String part = it.next().toString();
if (part.length()>0 && i<this.search.length-1) out += replace(part,i+1);
if (it.hasNext()) out += this.replace[i];
}
return out;
}
}
And then
String[] words;
//fill variable words
String[] s_input = "ou|u|c|ch|ce|ci".split("\\|",-1);
String[] r_input = "u|a|k|c|se|si".split("\\|",-1);
Replacements reps = new Replacements(s_input,r_input);
for (String word : words) {
System.out.println(reps.replace(word));
}
(s_input
and r_input
would be up to the user, so they're just examples, just like the program wouldn't actually use println()
)
This code makes sure longer search strings get looked for first and also covers the second condition above.
It is, however, quite costly. What would be the most efficient way to accomplish what I'm doing here (especially if the number of Strings in words
is significantly large)?
With my current code, "couch" should be converted into "kuc" (except it doesn't, apparently; it now does, thanks to the -1 in split(p,-1)
)
Upvotes: 1
Views: 132
Reputation: 1686
public class Replacements
{
private String[] search; // sorted in descending length and order, eg: sch, ch, c
private String[] replace; // corresponding replacement
Replacements(String[] s, String[] r)
{
if (s.length != r.length)
throw new IllegalArgumentException();
final TreeMap<String, String> map = new TreeMap<String, String>(Collections.reverseOrder());
for (int i = 0; i < s.length; i++)
map.put(s[i], r[i]);
this.search = map.keySet().toArray(new String[map.size()]);
this.replace = map.values().toArray(new String[map.size()]);
}
public String replace(String input)
{
final StringBuilder result = new StringBuilder();
// start of yet-to-be-copied substring
int s = 0;
SEARCH:
for (int i = s; i < input.length(); i++)
{
for (int p = 0; p < this.search.length; p++)
{
if (input.regionMatches(i, this.search[p], 0, this.search[p].length()))
{
// append buffer and replacement
result.append(input, s, i).append(this.replace[p]);
// skip beyond current match and reset buffer
i += this.search[p].length();
s = i--;
continue SEARCH;
}
}
}
if (s == 0) // no matches? no changes!
return input;
// append remaining buffer
return result.append(input, s, input.length()).toString();
}
}
Upvotes: 0
Reputation: 109547
One could make a regex pattern from the keys and leave it to that module for optimization.
Obviously
"(ou|u|ch|ce|ci|c)"
needs to take care of ce/ci/c, either by reverse sorting or immediately as tree:
"(c(e|h|i)?|ou|u)"
Then
String soughtKeys = "ou|u|ch|ce|ci|c"; // c last
String replacements = "u|a|c|se|si|k";
Map<String, String> map = new HashMap<>();
... fill map
Pattern pattern = Pattern.compile("(" + soughtKeys + ")");
for (String word : words) {
StringBuffer sb = new StringBuffer();
Matcher m = pattern.matcher(word);
while (m.find()) {
m.appendReplacement(sb, map.get(m.group());
}
m.appendTail(sb);
System.out.printf("%s -> %s%n", word, sb.toString());
}
The advantage being that regex is quite smart (though slow), and replacements are not done over replaced text.
Upvotes: 0
Reputation: 86764
This is not a full solution but it shows how to scan the input and find all target substrings in one pass. You would use a StringBuilder
to assemble the result, looking up the replacements in a Map as you are currently doing. Use the start and end indexes to handle copying of non-matching segments.
public static void main(String[] args) throws Exception
{
Pattern p = Pattern.compile("(ou|ch|ce|ci|u|c)");
Matcher m = p.matcher("auouuchcceaecxici");
while (m.find())
{
MatchResult r = m.toMatchResult();
System.out.printf("s=%d e=%d '%s'\n", r.start(), r.end(), r.group());
}
}
Output:
s=1 e=2 'u'
s=2 e=4 'ou'
s=4 e=5 'u'
s=5 e=7 'ch'
s=7 e=8 'c'
s=8 e=10 'ce'
s=12 e=13 'c'
s=15 e=17 'ci'
Note the strings in the regex have to be sorted in order of descending length to work correctly.
Upvotes: 1