Reputation: 3578
I need to find whole words in a sentence, but without using regular expressions. So if I wanted to find the word "the" in this sentence: "The quick brown fox jumps over the lazy dog", I'm currently using:
String text = "the, quick brown fox jumps over the lazy dog";
String keyword = "the";
Matcher matcher = Pattern.compile("\\b"+keyword+"\\b").matcher(text);
Boolean contains = matcher.find();
but if I used:
Boolean contains = text.contains(keyword);
and pad the keyword with a space, it won't find the first "the" in the sentence, both because it doesn't have surround whitespaces and the punctuations.
To be clear, I'm building an Android app, and I'm getting memory leaks and it might be because I'm using a regular-expression in a ListView, so it's performing a regular-expression match X number of times, depending on the items in the Listview.
Upvotes: 0
Views: 2332
Reputation: 81
I have a project that requires whole word matching, but I can't use regular expressions(because regular expressions escape some keywords), I tried to write my own code to simulate it with non-regular expressions (\bxxx\b
), I only know C#
and it worked fine.
public static class Finder
{
public static bool Find(string? input, string? pattern, bool isMatchCase = false, bool isMatchWholeWord = false, bool isMatchRegex = false)
{
if (String.IsNullOrWhiteSpace(input) || String.IsNullOrWhiteSpace(pattern))
{
return false;
}
if (!isMatchCase && !isMatchRegex)
{
input = input.ToLower();
pattern = pattern.ToLower();
}
if (isMatchWholeWord && !isMatchRegex)
{
int len = pattern.Length;
int suffix = 0;
while (true)
{
int start = input.IndexOf(pattern, suffix);
if (start == -1)
{
return false;
}
int end = start + len - 1;
int prefix = start - 1;
suffix = end + 1;
bool isPrefixMatched, isSuffixMatched;
if (start == 0)
{
isPrefixMatched = true;
}
else
{
isPrefixMatched = IsWord(input[prefix]) != IsWord(input[start]);
}
if (end == input.Length - 1)
{
isSuffixMatched = true;
}
else
{
isSuffixMatched = IsWord(input[suffix]) != IsWord(input[end]);
}
if (isPrefixMatched && isSuffixMatched)
{
return true;
}
}
}
if (isMatchRegex)
{
if (isMatchWholeWord)
{
if (!pattern.StartsWith(@"\b"))
{
pattern = $@"\b{pattern}";
}
if (!pattern.EndsWith(@"\b"))
{
pattern = $@"{pattern}\b";
}
}
return Regex.IsMatch(input, pattern, isMatchCase ? RegexOptions.None : RegexOptions.IgnoreCase);
}
return input.Contains(pattern);
}
private static bool IsWord(char ch)
{
return Char.IsLetterOrDigit(ch) || ch == '_';
}
}
Upvotes: 0
Reputation: 11
In the comments of the StringTokenizer.class:
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
The following example illustrates how the String.split method can be used to break up a string into its basic tokens:
String[] result = "this is a test".split("\\s");
for (int x=0; x<result.length; x++)
System.out.println(result[x]);
prints the following output:
this
is
a
test
Iterate through your resulting string array and test for equality and keep a count.
for (String s : result)
{
count++;
}
If this is a homework assignment, tell your lecturer to read up on Java, times have changed. I remember having the exact same stupid questions during school and it does nothing to prepare you for the real world.
Upvotes: 0
Reputation: 5054
Simply iterate over the characters and keep storing them in a char buffer. Every time you see a whitespace, empty the buffer into a list of words and go on till you reach the end.
Upvotes: 0
Reputation: 718788
What you do is search for "the"
. Then for each match you test to see if the surrounding characters are white space (or punctuation), or if the match is at the beginning / end of the string respectively.
Upvotes: 1
Reputation: 72981
If you needed to check for multiple words and do it without regular expressions you could use StringTokenizer with a space as the delimiter.
You could then build a custom search method. Otherwise, the other solutions using String.contains()
or String.indexOf()
qualify.
Upvotes: 1
Reputation: 33171
public int findWholeWorld(final String text, final String searchString) {
return (" " + text + " ").indexOf(" " + searchString + " ");
}
This will give you the index of the first occurrence of the word "the" or -1 if the word "the" doesn't exist.
Upvotes: 1
Reputation: 51052
Split the string on space, and then see if the resulting array contains your word.
Upvotes: 0