Lightsaber
Lightsaber

Reputation: 5

Iterate over file and get word index from other file

I need to iterate over all lines in two files (kind of at the same time) and get the index from a word of one of them.

Example:

small_wordlist:

Book
Woman
Child

big_wordlist:

Book
Man
Dog
Cat
Child
Dinosaur
Woman

And so on. the wanted result would be:

1
7
5

(or every time one less since we start with 0, that doesn't really matter) and save that in another file.

I can't get it to work with sth like this:

g = open('big_wordlist', 'r')
i = open('index_list', 'w')

with open('small_wordlist', 'r') as h:
for line in h:
    p = h.readline()
    for num, line in enumerate(g):          # num is my found index
            if (line.startswith(p + "\n")): # need that to make sure we only get the correct word and nothing before / after it
                 i.write("%s" % (num) + "\n")

So i need to iterate over the small wordlist, get the specific word index from the word found in the big wordlist and write it in my index list.

Now i get "mixing iteration and read methods would lose data" - i would not care about that after i wrote the num into my index list, the p (word at that time) will change (and should) anyway with each new line in small_wordlist.

I have problems when i do my iteration over the small wordlist, when I replace p with "Book" it does work, now I need make it work with a variable that is the word in each line of my small wordlist.

Upvotes: 0

Views: 113

Answers (1)

aghast
aghast

Reputation: 15310

You don't need to process the two files at the same time. Instead, you need to build an index of the first file, and then process the second file looking up words in the index.

#!python3

small_wordlist = """
    Book
    Woman
    Child
""".strip()

big_wordlist = """
    Book
    Man
    Dog
    Cat
    Child
    Dinosaur
    Woman
""".strip()

import io

# Read the words from the big wordlist into word_index

#with open('big_wordlist.txt') as big:
with io.StringIO(big_wordlist) as big:
    ix = 0
    word_index = {}

    for line in big:
        word = line.strip()
        if word not in word_index:
            word_index[word] = ix
        ix += 1

#with open('small_wordlist.txt') as small:
with io.StringIO(small_wordlist) as small:
    for line in small:
        word = line.strip()
        if word not in word_index:
            print('-1')  # Or print('not found') or raise exception or...
        else:
            print(word_index[word])

Upvotes: 1

Related Questions