Reputation: 126

Write a program that reads the contents of a text file and return index of words into Values

I am doing an exercise from a textbook and I have been stuck for 3 days finally I decided to get help here.

The question is: write a program that reads the contents of a text file. The program should create a dictionary in which the key-value pairs are described as follows:

Key. The keys are the individual words found in the file.
Values. Each value is a list that contains the line numbers in the file where the word (the key) is found.

For example: suppose the word “robot” is found in lines 7, 18, 94, and 138. The dictionary would contain an element in which the key was the string “robot”, and the value was a list containing the numbers 7, 18, 94, and 138.

Once the dictionary is built, the program should create another text file, known as a word index, listing the contents of the dictionary. The word index file should contain an alphabetical listing of the words that are stored as keys in the dictionary, along with the line numbers where the words appear in the original file.

Figure 9-1 shows an example of an original text file (Kennedy.txt) and its index file (index.txt).

Here are the code i tried so far and the functions is not completed, not sure what to do next:

def create_Kennedytxt():
    f = open('Kennedy.txt','w')
    f.write('We observe today not a victory\n')
    f.write('of party but a celebration\n')
    f.write('of freedom symbolizing an end\n')
    f.write('as well as a beginning\n')
    f.write('signifying renewal as well\n')
    f.write('as change\n')
    f.close()

create_Kennedytxt()

def split_words():
    f = open('Kennedy.txt','r')

    count = 0
    
    for x in f:
        y = x.strip()
        z = y.split(' ')    #get individual character to find its index
        count+=1            #get index for each line during for loop
   
                            
split_words()

can anyone help me with the answer of code or give me some hints? and the answer shouldn't be import anythings, but only use methods and functions to achieved it. i will be very appreciated it!

Upvotes: 0

Answers (3)

Amany Dyab

Reputation: 1

from collections import Counter
fname = input("Enter file name: ")
with open (fname, 'r') as input_file:
     count = Counter(word for line in input_file
                         for word in line.split())
print(count.most_common(20))
f= open("index.txt","w+")
s = str(count.most_common(20))
f.write(s)
f.close()

Upvotes: 0

Joe Ferndz

Reputation: 8508

This is a three step process:

Read the file line by line and split each line into words
Identify all unique words in each line (use set to do this)
For each word, check if word exists in the dictionary.
- If it exists in the dictionary, then add the line number (line starts with 0, so you may need to add +1) to add 1 to it)
- If it does NOT exist in the dictionary, create a new key entry for the word and include the line number.

The dictionary will be a keys with lists.

To do this, you can create a program like this:

keys_in_file = {}
with open ('Kennedy.txt', 'r') as f:
    for i,line in enumerate(f):
        words = line.strip().split()
        for word in set(words):
            keys_in_file.setdefault(word, []).append(i+1) 

print (keys_in_file)

The output of the file you provided (Kennedy.txt) is:

{'today': [1], 'victory': [1], 'observe': [1], 'a': [1, 2, 4], 'We': [1], 'not': [1], 'celebration': [2], 'of': [2, 3], 'party': [2], 'but': [2], 'freedom': [3], 'an': [3], 'symbolizing': [3], 'end': [3], 'as': [4, 5, 6], 'well': [4, 5], 'beginning': [4], 'renewal': [5], 'signifying': [5], 'change': [6]}

If you want to ensure that all words (We, WE, we) get counted as same word, you need to convert words to lowercase.

words = line.lower().strip().split()

If you want the values to be printed in the format of index.txt, then you add the following to the code:

for k in sorted(keys_in_file):
    print (k+':', *keys_in_file[k])

The output will be as follows: Note: I converted We to lowercase so it will show up later in the alphabetic order

a: 1 2 4
an: 3
as: 4 5 6
beginning: 4
but: 2
celebration: 2
change: 6
end: 3
freedom: 3
not: 1
observe: 1
of: 2 3
party: 2
renewal: 5
signifying: 5
symbolizing: 3
today: 1
victory: 1
we: 1
well: 4 5

Upvotes: 1

questionerofdy

Reputation: 563

You are on the right track. This is how it can be done

def build_word_index(txt):
    out = {}
    for i, line in enumerate(txt.split("\n")):
        for word in line.strip().split(" "):
            if word not in out:
                out[word] = [i + 1]
            else:
                out[word].append(i + 1)
    return out

print(build_word_index('''
We observe today not a victory
of party but a celebration
of freedom symbolizing an end
as well as a beginning
signifying renewal as well
as change
'''))

This works by first defining a dictionary

out = {}

Then we are going to loop line by line of input (we are going to use enumerate just so we have an index that starts from 0 and goes up by one each line

    for i, line in enumerate(txt.split("\n")):

Next we are going to loop for each word in that line

        for word in line.strip().split(" "):

Finally we are going to examine two cases by checking if our dictionary does not contain the word

            if word not in out:

In the case we haven't seen the word before we need to create and entry in our dictionary that keeps track of words. We are using a list so that we can handle words being on multiple lines. (We are adding 1 to i here to offset us starting at 0).

                out[word] = [i + 1]

In the case we have seen the word before we can just add the line we are currently on to the end of it

                out[word].append(i + 1)

This will get us a dictionary where each word is the key and the value is a list of what lines the word appears in.

I am going to leave how to actually output the dictionary correctly to you.

Upvotes: 2

Write a program that reads the contents of a text file and return index of words into Values

Answers (3)

Related Questions