Julia Fasick
Julia Fasick

Reputation: 131

Manipulating text files in finding keywords

I've been working on a program that finds words that only appear once in text. However, when the program finds a word, I want it to give some context around that word.

Here's my code.

from collections import Counter
from string import punctuation

text = str("bible.txt")
with open(text) as f:
     word_counts = Counter(word.strip(punctuation) for line in f for word in 
line.split())

unique = [word.lower() for word, count in word_counts.items() if count == 1]

with open(text, 'r') as myfile:
    wordlist = myfile.read().lower()

print(unique)
print(len(unique), " unique words found.")

for word in unique:
    first = 1
    second = 1
    index = wordlist.index(word)
    if wordlist[index - first:index] is not int():
        first += 1
    if wordlist[index:index + second] is not ".":
        second += 1
    print(" ")

    first_part = wordlist[index - first:index]
    second_part = wordlist[index:index + second]
    print(word)
    print("%s %s" % ("".join(first_part), "".join(second_part)))

Where this is the input text.

Ideally, it'd show

sojournings
1 Jacob lived in the land of his father's sojournings, in the land of 
Canaan.

generations
2 These are the generations of Jacob.

Basically I want it to show the sentence that word is in, with verse number at the beginning. I know I'd do something with the index, but I honestly don't know how to do that.

Any help would be greatly appreciated.

Thanks, Ben

Upvotes: 0

Views: 50

Answers (2)

Julia Fasick
Julia Fasick

Reputation: 131

I'm just gonna leave the completed code here for anyone who comes across this in the future.

from collections import Counter
from string import punctuation
import time

path = input("Path to file: ")
with open(path) as f:
    word_counts = Counter(word.strip(punctuation) for line in f for word in line.split())

wordlist = open(path).read().replace('\n', '')

unique = [word for word, count in word_counts.items() if count == 1]

print(unique)
print(len(unique), " unique words found.")

for word in unique:
    print(" ")
    word_posn = wordlist.find(word)
    start_posn = wordlist[:word_posn].rfind("." or "," or "!" or "?")) + 1
    end_posn = wordlist[word_posn:].find("." or "," or "!" or "?")) + word_posn + 1
    print(word)
    print(wordlist[start_posn:end_posn])

Also shoutout to @lb_so for the help!

Upvotes: 1

lb_so
lb_so

Reputation: 146

I would retrieve the index of the first letter of the chosen word (in the entire string, which for the bible is going to be long ;') and then find the first "." preceding that letter. I would also find the 'next' ".", but enforcing a minimum length perhaps to ensure context in small sentences. That gives you a range to include / print / display.

def stringer():

    mystring = """ the quick brown fox. Which jumped over the lazy dog and died a horrible death. ad ipsum valorem"""

    word_posn = mystring.find("lazy")
    start_posn = mystring[:word_posn].rfind(".") + 1
    end_posn = mystring[word_posn:].find(".")+word_posn +1

    return '"' + mystring[start_posn:end_posn].strip() + '"'

This was coded very quickly so apologies for errors.

Upvotes: 1

Related Questions