Reputation: 131
I've been working on a program that finds words that only appear once in text. However, when the program finds a word, I want it to give some context around that word.
Here's my code.
from collections import Counter
from string import punctuation
text = str("bible.txt")
with open(text) as f:
word_counts = Counter(word.strip(punctuation) for line in f for word in
line.split())
unique = [word.lower() for word, count in word_counts.items() if count == 1]
with open(text, 'r') as myfile:
wordlist = myfile.read().lower()
print(unique)
print(len(unique), " unique words found.")
for word in unique:
first = 1
second = 1
index = wordlist.index(word)
if wordlist[index - first:index] is not int():
first += 1
if wordlist[index:index + second] is not ".":
second += 1
print(" ")
first_part = wordlist[index - first:index]
second_part = wordlist[index:index + second]
print(word)
print("%s %s" % ("".join(first_part), "".join(second_part)))
Where this is the input text.
Ideally, it'd show
sojournings
1 Jacob lived in the land of his father's sojournings, in the land of
Canaan.
generations
2 These are the generations of Jacob.
Basically I want it to show the sentence that word is in, with verse number at the beginning. I know I'd do something with the index, but I honestly don't know how to do that.
Any help would be greatly appreciated.
Thanks, Ben
Upvotes: 0
Views: 50
Reputation: 131
I'm just gonna leave the completed code here for anyone who comes across this in the future.
from collections import Counter
from string import punctuation
import time
path = input("Path to file: ")
with open(path) as f:
word_counts = Counter(word.strip(punctuation) for line in f for word in line.split())
wordlist = open(path).read().replace('\n', '')
unique = [word for word, count in word_counts.items() if count == 1]
print(unique)
print(len(unique), " unique words found.")
for word in unique:
print(" ")
word_posn = wordlist.find(word)
start_posn = wordlist[:word_posn].rfind("." or "," or "!" or "?")) + 1
end_posn = wordlist[word_posn:].find("." or "," or "!" or "?")) + word_posn + 1
print(word)
print(wordlist[start_posn:end_posn])
Also shoutout to @lb_so for the help!
Upvotes: 1
Reputation: 146
I would retrieve the index of the first letter of the chosen word (in the entire string, which for the bible is going to be long ;') and then find the first "." preceding that letter. I would also find the 'next' ".", but enforcing a minimum length perhaps to ensure context in small sentences. That gives you a range to include / print / display.
def stringer():
mystring = """ the quick brown fox. Which jumped over the lazy dog and died a horrible death. ad ipsum valorem"""
word_posn = mystring.find("lazy")
start_posn = mystring[:word_posn].rfind(".") + 1
end_posn = mystring[word_posn:].find(".")+word_posn +1
return '"' + mystring[start_posn:end_posn].strip() + '"'
This was coded very quickly so apologies for errors.
Upvotes: 1