Reputation: 7374
For example I have this situation:
on server we have list of words:
{'word1', 'word2', 'word3', 'word4'}
User send request to the server with some text:
"some text here word1. many many other text word4"
Server must processing this input text, find all words in this text from server list and mark this words and send resulting text to the user:
"some text here <mark>word1<mark>. many many other text <mark>word4<mark>"
It is main idea, main concept. At this moment I must implement this logic.
So, I ask you about help.
It is necessary for me to be defined technologies and instruments.
What instruments you can recommend for this task?
Upvotes: 2
Views: 248
Reputation: 115368
Here is the naive solution:
for (String word : words) {
text = text.replaceAll(word, "<mark>" +word + "</mark>");
}
Better solution should use regular expression to avoid replacement of word fragments, e.g. wo<mark>man</mark>
. You should create regex like "\\b" + word + "\\b"
.
But I'd suggest you to check out ready for use engines like Solr (or Lucine).
Upvotes: 2
Reputation: 6675
The simplest way to accomplish this would be to use String.replaceAll. You can combine all of the key words into one regular expression and use a back-reference to include the original word. If the keywords contain regular expression operators you will have to escape those.
It's usually a mistake to call String.replaceAll in a loop because the intermediate results could contain a match that wasn't in the input. As a contrived example, suppose I wanted to replace "ab" with "b" and "bb" with "c". So, the correct output for "bab" would be "bb". However, "bab".replaceAll("ab", "b").replaceAll("bb", "c") is "c". For the same reason, you wouldn't want to use String.replace in a loop although that seems like the easiest way to accomplish the task at hand.
If you need more performance than this requires, the first step would be to compile the regular expressions in advance. If you need a lot more, there are some really interesting research papers on string search.
Upvotes: 1
Reputation: 3906
There are many open questions like what exactly delimits "words". E.g. do you wish to highlight "full" in "full-text"?
By the way: Lucene, Solr, etc won't help too much here. Of course, you can use them, but it just doen't make sense. Their strength is to build an index of text. Text can mean HUGE amounts of data. A set of words is bounded by the dictonary of the language. Is usually is a joke size-wise for computers. A simple HashSet should suffice your needs.
Upvotes: 2