Reputation: 21
I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).
As an example I need to check the word "can" is inside the sentence "I can run fast" . If I use strings strings.Contains("can", "I can run fast")
this gives true . But strings.Contains("can", "I cannot run fast")
also gives true as it contains can . How can I check exactly the word can gives true and cannot gives false in the above mentioned scenario ?
Upvotes: 0
Views: 2267
Reputation: 2784
I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).
I'm trying to implement a filter for given set of words.
Here's a solution which uses simple algorithms for a word. The algorithms can distinguish between "can", "cannot", and "can't".
package main
import (
"fmt"
"strings"
"unicode"
)
func newFilter(words []string) map[string]struct{} {
filter := make(map[string]struct{}, len(words))
for _, word := range words {
word = strings.TrimSpace(word)
word = strings.ToLower(word)
if len(word) > 0 {
filter[word] = struct{}{}
}
}
return filter
}
func applyFilter(text string, filter map[string]struct{}) bool {
const (
rApostrophe = '\u0027'
sApostrophe = string(rApostrophe)
sApostropheS = string(rApostrophe) + "s"
rSoftHyphen = '\u00AD'
sSoftHyphen = string(rSoftHyphen)
sHyphenLF = "-\n"
sHyphenCRLF = "-\r\n"
)
split := func(r rune) bool {
return !unicode.IsLetter(r) && r != rApostrophe
}
text = strings.ToLower(text)
if strings.Contains(text, sSoftHyphen) {
text = strings.ReplaceAll(text, sSoftHyphen, "")
}
if strings.Contains(text, sHyphenLF) {
text = strings.ReplaceAll(text, sHyphenLF, "")
} else if strings.Contains(text, sHyphenCRLF) {
text = strings.ReplaceAll(text, sHyphenCRLF, "")
}
words := strings.FieldsFunc(text, split)
for _, word := range words {
if strings.HasSuffix(word, sApostrophe) {
word = word[:len(word)-len(sApostrophe)]
} else if strings.HasSuffix(word, sApostropheS) {
word = word[:len(word)-len(sApostropheS)]
}
if _, ok := filter[word]; ok {
return true
}
}
return false
}
func main() {
filter := newFilter([]string{"can"})
text := "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can-\nnot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
filter = newFilter([]string{"cannot", "can't"})
text = "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can-\nnot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
}
https://go.dev/play/p/sQpTt5JY8Qt
Upvotes: 1
Reputation: 213318
Just as a first attempt, you can try using a regular expression:
import "regexp"
var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)
func containsCan(s string) bool {
return containsCanRegex.MatchString(s)
}
Note that this matches title-case, so it matches "Can I go?"
.
The \b
in a regular expression matches a "word boundary". It just means there is a word character on one side, and a non-word character, beginning of text, or end of text on the other side.
Note that this will match "can't"
because \b
treats '
as a word boundary (since it's a non-word character). It sounds like this is not what you want. In order to come up with a more general solution, you may want to know just how general you want the solution to be. A very basic approach would be to split the words first, and then check if any of those words match "can"
. You could split the words with a regular expression or by using a text segmentation library.
I don't know how to write a regular expression that would accept "can"
but reject "can't"
in a sentence--the "regexp"
package does not support negative lookahead.
Upvotes: 3