strawberrylatte
strawberrylatte

Reputation: 123

How to count words in file

I wanted to make a function to count the words in a file to find the positions of each words in the file, I want the output to be,

a, position: 0

aah, position: 1

aahed, position: 2

I already tried this to count the words, but, I couldn't use it to get the positions of the words

scanner := bufio.NewScanner(strings.NewReader(input))

// Set the split function for the scanning operation.
scanner.Split(bufio.ScanWords)

// Count the words.
count := 0
for scanner.Scan() {
    count++
}

if err := scanner.Err(); err != nil {
    fmt.Fprintln(os.Stderr, "reading input:", err)
}

fmt.Printf("%d\n", count)

Is it possible for me to use for loop to do this? Because I would like to index the position. For example word[position]==word[position+1], to find out if a word in a specific position is the same with the word in the next position.

Upvotes: 0

Views: 2516

Answers (2)

Hermsi1337
Hermsi1337

Reputation: 251

Imagine having a testfile.txt:

this is fine

You can use this go-script to loop over each word and print the word with it's current position:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    // initiate file-handle to read from
    fileHandle, err := os.Open("testfile.txt")

    // check if file-handle was initiated correctly
    if err != nil {
        panic(err)
    }

    // make sure to close file-handle upon return
    defer fileHandle.Close()

    // initiate scanner from file handle
    fileScanner := bufio.NewScanner(fileHandle)

    // tell the scanner to split by words
    fileScanner.Split(bufio.ScanWords)

    // initiate counter
    count := 0

    // for looping through results
    for fileScanner.Scan() {
        fmt.Printf("word: '%s' - position: '%d'\n", fileScanner.Text(), count)
        count++
    }

    // check if there was an error while reading words from file
    if err := fileScanner.Err(); err != nil {
        panic(err)
    }

    // print total word count
    fmt.Printf("total word count: '%d'", count)
}

Output:

$ go run main.go
word: 'this' - position: '0'
word: 'is' - position: '1'
word: 'fine' - position: '2'
total word count: '3'

If you want to compare the words by index you could load them into a slice first.

Imagine having a textfile:

fine this is fine

Use this code:

package main

import (
    "bufio"
    "fmt"
    "os"
)

func main() {
    // initiate file-handle to read from
    fileHandle, err := os.Open("testfile.txt")

    // check if file-handle was initiated correctly
    if err != nil {
        panic(err)
    }

    // make sure to close file-handle upon return
    defer fileHandle.Close()

    // initiate scanner from file handle
    fileScanner := bufio.NewScanner(fileHandle)

    // tell the scanner to split by words
    fileScanner.Split(bufio.ScanWords)

    // initiate wordsSlice
    var wordSlice []string

    // for looping through results
    for fileScanner.Scan() {
        wordSlice = append(wordSlice, fileScanner.Text())
    }

    // check if there was an error while reading words from file
    if err := fileScanner.Err(); err != nil {
        panic(err)
    }

    // loop through word slice and print word with index
    for i, w := range wordSlice {
        fmt.Printf("word: '%s' - position: '%d'\n", w, i)
    }

    // compare words by index
    firstWordPos := 0
    equalsWordPos := 3
    if wordSlice[firstWordPos] == wordSlice[equalsWordPos] {
        fmt.Printf("word at position '%d' and '%d' is equal: '%s'\n", firstWordPos, equalsWordPos, wordSlice[firstWordPos])
    }

    // print total word count
    fmt.Printf("total word count: '%d'", len(wordSlice))
}

Output:

$ go run main.go
word: 'fine' - position: '0'
word: 'this' - position: '1'
word: 'is' - position: '2'
word: 'fine' - position: '3'
word at position '0' and '3' is equal: 'fine'
total word count: '4'

Upvotes: 1

Vadim Ashikhman
Vadim Ashikhman

Reputation: 10136

You can read input string one character at a time. This way you have full control on the data you need to output. In Go characters are called runes:

b, err := ioutil.ReadFile("test.txt")
if err != nil {
    panic(err)
}

reader := bytes.NewReader(b)
// Word is temporary word buffer that we use to collect characters for current word.
word := strings.Builder{}
wordPos := 0
line := 0
pos := 0
for {
    // Read next character
    if r, _, err := reader.ReadRune(); err != nil {
        if err == io.EOF {
            // Output last word if this is end of file
            fmt.Println(word.String(), "line:", line, "position:", wordPos)
            break
        } else {
            panic(err)
        }
    } else {
        // If current character is new line reset position counters and word buffer.
        if r == '\n' {
            fmt.Println(word.String(), "line:", line, "position:", wordPos)
            word.Reset()
            pos = 0
            wordPos = 0
            line++
        } else if r == ' ' { // Found word separator: output word, reset word buffer and set next word position.
            fmt.Println(word.String(), "line:", line, "position:", wordPos)
            word.Reset()
            wordPos = pos + 1
            pos++
        } else { // Just a regular character: write it to word buffer.
            word.WriteRune(r)
            pos++
        }
    }
}

I use strings.Builder to get rid of unnecessary string copying.

Also you have to adjust this example to work for edge cases like empty line and maybe others.

Upvotes: 2

Related Questions