zonyang
zonyang

Reputation: 878

Best way to search for a special string in a text

If I have a piece of text around 3000 characters. I want search for strings with certain characteristics for example strings like [*].

That is, I want to get [a] and [bc] from

sjfhshdkfjhskdhfksdf[a]sfdsgfsdf[bc]

I know there is an algorithm called KMP that guarantee a linear time searching operation through a text, but here I don't have a fixed string to be found, maybe I have to use some regular expression at some place.

How can I do this better than O(n^2)? Is there any light libraries for this if I'm using java?

Upvotes: 2

Views: 260

Answers (2)

Bohemian
Bohemian

Reputation: 425348

Here's how to do it in one line:

String[] hits = str.replaceAll("^.*?\\[|][^\\]]*$", "").split("].*?\\[");

This works by stripping off leading and trailing chars up to and including the first/last opening/closing square bracket, then splits on a close bracket to the next opening bracket (inclusive).

Upvotes: 0

castletheperson
castletheperson

Reputation: 33516

No libraries needed, you've effectively described a use case for regex! They are highly optimized for searching, and in this case will be O(n).

String str = "sjfhshdkfjhskdhfksdf[a]sfdsgfsdf[bc]";
List<String> allMatches = new ArrayList<>();
Matcher m = Pattern.compile("\\[[^\\]]*]").matcher(str);
while (m.find()) {
    allMatches.add(m.group());
}

Regex Demo

If you have any doubts though and really want some O(n) that you can see, here's an algorithm:

String str = "sjfhshdkfjhskdhfksdf[a]sfdsgfsdf[bc]";
List<String> allMatches = new ArrayList<>();
for (int i = str.indexOf('['), j; i != -1; i = str.indexOf('[', j + 1)) {
    j = str.indexOf(']', i + 1);
    // if `j` is -1, the brackets are unbalanced. Perhaps throw an Exception?
    allMatches.add(str.substring(i, j + 1));
}

Upvotes: 6

Related Questions