Reputation: 5
I need to iterate over all lines in two files (kind of at the same time) and get the index from a word of one of them.
Example:
small_wordlist:
Book
Woman
Child
big_wordlist:
Book
Man
Dog
Cat
Child
Dinosaur
Woman
And so on. the wanted result would be:
1
7
5
(or every time one less since we start with 0, that doesn't really matter) and save that in another file.
I can't get it to work with sth like this:
g = open('big_wordlist', 'r')
i = open('index_list', 'w')
with open('small_wordlist', 'r') as h:
for line in h:
p = h.readline()
for num, line in enumerate(g): # num is my found index
if (line.startswith(p + "\n")): # need that to make sure we only get the correct word and nothing before / after it
i.write("%s" % (num) + "\n")
So i need to iterate over the small wordlist, get the specific word index from the word found in the big wordlist and write it in my index list.
Now i get "mixing iteration and read methods would lose data" - i would not care about that after i wrote the num into my index list, the p (word at that time) will change (and should) anyway with each new line in small_wordlist.
I have problems when i do my iteration over the small wordlist, when I replace p with "Book" it does work, now I need make it work with a variable that is the word in each line of my small wordlist.
Upvotes: 0
Views: 113
Reputation: 15310
You don't need to process the two files at the same time. Instead, you need to build an index of the first file, and then process the second file looking up words in the index.
#!python3
small_wordlist = """
Book
Woman
Child
""".strip()
big_wordlist = """
Book
Man
Dog
Cat
Child
Dinosaur
Woman
""".strip()
import io
# Read the words from the big wordlist into word_index
#with open('big_wordlist.txt') as big:
with io.StringIO(big_wordlist) as big:
ix = 0
word_index = {}
for line in big:
word = line.strip()
if word not in word_index:
word_index[word] = ix
ix += 1
#with open('small_wordlist.txt') as small:
with io.StringIO(small_wordlist) as small:
for line in small:
word = line.strip()
if word not in word_index:
print('-1') # Or print('not found') or raise exception or...
else:
print(word_index[word])
Upvotes: 1