JKS Kar
JKS Kar

Reputation: 21

How to check whether the given word exists in a sentence(string) without using the contains function in golang

I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).

As an example I need to check the word "can" is inside the sentence "I can run fast" . If I use strings strings.Contains("can", "I can run fast") this gives true . But strings.Contains("can", "I cannot run fast") also gives true as it contains can . How can I check exactly the word can gives true and cannot gives false in the above mentioned scenario ?

Upvotes: 0

Views: 2267

Answers (2)

rocka2q
rocka2q

Reputation: 2784

I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).

I'm trying to implement a filter for given set of words.

Here's a solution which uses simple algorithms for a word. The algorithms can distinguish between "can", "cannot", and "can't".

package main

import (
    "fmt"
    "strings"
    "unicode"
)

func newFilter(words []string) map[string]struct{} {
    filter := make(map[string]struct{}, len(words))
    for _, word := range words {
        word = strings.TrimSpace(word)
        word = strings.ToLower(word)
        if len(word) > 0 {
            filter[word] = struct{}{}
        }
    }
    return filter
}

func applyFilter(text string, filter map[string]struct{}) bool {
    const (
        rApostrophe  = '\u0027'
        sApostrophe  = string(rApostrophe)
        sApostropheS = string(rApostrophe) + "s"
        rSoftHyphen  = '\u00AD'
        sSoftHyphen  = string(rSoftHyphen)
        sHyphenLF    = "-\n"
        sHyphenCRLF  = "-\r\n"
    )

    split := func(r rune) bool {
        return !unicode.IsLetter(r) && r != rApostrophe
    }

    text = strings.ToLower(text)
    if strings.Contains(text, sSoftHyphen) {
        text = strings.ReplaceAll(text, sSoftHyphen, "")
    }
    if strings.Contains(text, sHyphenLF) {
        text = strings.ReplaceAll(text, sHyphenLF, "")
    } else if strings.Contains(text, sHyphenCRLF) {
        text = strings.ReplaceAll(text, sHyphenCRLF, "")
    }

    words := strings.FieldsFunc(text, split)
    for _, word := range words {
        if strings.HasSuffix(word, sApostrophe) {
            word = word[:len(word)-len(sApostrophe)]
        } else if strings.HasSuffix(word, sApostropheS) {
            word = word[:len(word)-len(sApostropheS)]
        }
        if _, ok := filter[word]; ok {
            return true
        }
    }
    return false
}

func main() {
    filter := newFilter([]string{"can"})
    text := "I can run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I cannot run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I can-\nnot run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I can't run fast"
    fmt.Println(applyFilter(text, filter))

    filter = newFilter([]string{"cannot", "can't"})
    text = "I can run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I cannot run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I can-\nnot run fast"
    fmt.Println(applyFilter(text, filter))
    text = "I can't run fast"
    fmt.Println(applyFilter(text, filter))
}

https://go.dev/play/p/sQpTt5JY8Qt

Upvotes: 1

Dietrich Epp
Dietrich Epp

Reputation: 213318

Just as a first attempt, you can try using a regular expression:

import "regexp"

var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)

func containsCan(s string) bool {
    return containsCanRegex.MatchString(s)
}

Note that this matches title-case, so it matches "Can I go?".

The \b in a regular expression matches a "word boundary". It just means there is a word character on one side, and a non-word character, beginning of text, or end of text on the other side.

Note that this will match "can't" because \b treats ' as a word boundary (since it's a non-word character). It sounds like this is not what you want. In order to come up with a more general solution, you may want to know just how general you want the solution to be. A very basic approach would be to split the words first, and then check if any of those words match "can". You could split the words with a regular expression or by using a text segmentation library.

I don't know how to write a regular expression that would accept "can" but reject "can't" in a sentence--the "regexp" package does not support negative lookahead.

Upvotes: 3

Related Questions