Reputation: 126
I am doing an exercise from a textbook and I have been stuck for 3 days finally I decided to get help here.
The question is: write a program that reads the contents of a text file. The program should create a dictionary in which the key-value pairs are described as follows:
For example: suppose the word “robot” is found in lines 7, 18, 94, and 138. The dictionary would contain an element in which the key was the string “robot”, and the value was a list containing the numbers 7, 18, 94, and 138.
Once the dictionary is built, the program should create another text file, known as a word index, listing the contents of the dictionary. The word index file should contain an alphabetical listing of the words that are stored as keys in the dictionary, along with the line numbers where the words appear in the original file.
Figure 9-1 shows an example of an original text file (Kennedy.txt) and its index file (index.txt).
Here are the code i tried so far and the functions is not completed, not sure what to do next:
def create_Kennedytxt():
f = open('Kennedy.txt','w')
f.write('We observe today not a victory\n')
f.write('of party but a celebration\n')
f.write('of freedom symbolizing an end\n')
f.write('as well as a beginning\n')
f.write('signifying renewal as well\n')
f.write('as change\n')
f.close()
create_Kennedytxt()
def split_words():
f = open('Kennedy.txt','r')
count = 0
for x in f:
y = x.strip()
z = y.split(' ') #get individual character to find its index
count+=1 #get index for each line during for loop
split_words()
can anyone help me with the answer of code or give me some hints? and the answer shouldn't be import anythings, but only use methods and functions to achieved it. i will be very appreciated it!
Upvotes: 0
Views: 6802
Reputation: 1
from collections import Counter
fname = input("Enter file name: ")
with open (fname, 'r') as input_file:
count = Counter(word for line in input_file
for word in line.split())
print(count.most_common(20))
f= open("index.txt","w+")
s = str(count.most_common(20))
f.write(s)
f.close()
Upvotes: 0
Reputation: 8508
This is a three step process:
The dictionary will be a keys with lists.
To do this, you can create a program like this:
keys_in_file = {}
with open ('Kennedy.txt', 'r') as f:
for i,line in enumerate(f):
words = line.strip().split()
for word in set(words):
keys_in_file.setdefault(word, []).append(i+1)
print (keys_in_file)
The output of the file you provided (Kennedy.txt) is:
{'today': [1], 'victory': [1], 'observe': [1], 'a': [1, 2, 4], 'We': [1], 'not': [1], 'celebration': [2], 'of': [2, 3], 'party': [2], 'but': [2], 'freedom': [3], 'an': [3], 'symbolizing': [3], 'end': [3], 'as': [4, 5, 6], 'well': [4, 5], 'beginning': [4], 'renewal': [5], 'signifying': [5], 'change': [6]}
If you want to ensure that all words (We
, WE
, we
) get counted as same word, you need to convert words to lowercase.
words = line.lower().strip().split()
If you want the values to be printed in the format of index.txt
, then you add the following to the code:
for k in sorted(keys_in_file):
print (k+':', *keys_in_file[k])
The output will be as follows:
Note: I converted We
to lowercase so it will show up later in the alphabetic order
a: 1 2 4
an: 3
as: 4 5 6
beginning: 4
but: 2
celebration: 2
change: 6
end: 3
freedom: 3
not: 1
observe: 1
of: 2 3
party: 2
renewal: 5
signifying: 5
symbolizing: 3
today: 1
victory: 1
we: 1
well: 4 5
Upvotes: 1
Reputation: 563
You are on the right track. This is how it can be done
def build_word_index(txt):
out = {}
for i, line in enumerate(txt.split("\n")):
for word in line.strip().split(" "):
if word not in out:
out[word] = [i + 1]
else:
out[word].append(i + 1)
return out
print(build_word_index('''
We observe today not a victory
of party but a celebration
of freedom symbolizing an end
as well as a beginning
signifying renewal as well
as change
'''))
This works by first defining a dictionary
out = {}
Then we are going to loop line by line of input (we are going to use enumerate just so we have an index that starts from 0 and goes up by one each line
for i, line in enumerate(txt.split("\n")):
Next we are going to loop for each word in that line
for word in line.strip().split(" "):
Finally we are going to examine two cases by checking if our dictionary does not contain the word
if word not in out:
In the case we haven't seen the word before we need to create and entry in our dictionary that keeps track of words. We are using a list so that we can handle words being on multiple lines. (We are adding 1 to i here to offset us starting at 0).
out[word] = [i + 1]
In the case we have seen the word before we can just add the line we are currently on to the end of it
out[word].append(i + 1)
This will get us a dictionary where each word is the key and the value is a list of what lines the word appears in.
I am going to leave how to actually output the dictionary correctly to you.
Upvotes: 2