Reputation: 107
I am currently making a program to scan a PDF file and look for the key word 'Ref'. Once this word is found I need to take the next two strings, 'code' and 'shares' and add them to a new list to be imported into Excel later.
I have written code to take the text from the PDF file and add it to a list. I then iterate through this list and look for the 'Ref' keyword. When the first one is found it adds it to the list no problem. However when it comes to the next, it adds the first instance of Ref (+the code and the shares) to the list again and not the next one in the PDF file...
Here is the code for adding the Ref + code + shares to the new list (python 3):
for word in wordList:
match = 'false'
if word == 'Ref':
match = 'true'
ref = word
code = wordList[wordList.index(ref)+1]
shares = wordList[wordList.index(ref)+2]
if match == 'true':
refList.append(ref)
refList.append(code)
refList.append(shares)
Here is the output:
['Ref', '1', '266','Ref', '1', '266','Ref', '1', '266','Ref', '1', '266','Ref', '1', '266','Ref', '1', '266']
As you can see its the same reference number each time... the correct output should be something like this:
['Ref', '1', '266','Ref', '2', '642','Ref', '3', '435','Ref', '4', '6763'] etc...
If anyone knows why it is always adding the first ref and code with every instance of 'Ref' in the wordList let me know! I am quite stuck! Thanks
Upvotes: 0
Views: 46
Reputation: 12199
Your issue is that the call to the index method of wordlist will only return you the first instance it fines. I.E you will always get the first instance of "Ref". Instead a better approach is to use enumerate over the list which will give the index and value for each entry as you go, then you can just reference the index value to get the next two elements. below is code example.
data = """
this
Ref
1
266
that
hello
Ref
2
642"""
refList = []
wordList = [item.rstrip() for item in data.splitlines()]
for index, word in enumerate(wordList):
match = 'false'
if word == 'Ref':
match = 'true'
ref = word
code = wordList[index+1]
shares = wordList[index+2]
if match == 'true':
refList.append(ref)
refList.append(code)
refList.append(shares)
print(refList)
OUTPUT
['Ref', '1', '266', 'Ref', '2', '642']
you could also clean up and remove a lot of unneeded code and just write it as:
for index, word in enumerate(wordList):
if word == 'Ref':
refList += [word, wordList[index+1], wordList[index+2]]
Upvotes: 1
Reputation: 15
When you use the list.index(str)
function, it returns the first occurrence of str
. To fix this, iterate by index:
for i in range(len(wordList):
match = False
if word == 'Ref':
match = True
ref = wordList[i]
code = wordList[i+1]
shares = wordList[i+2]
if match == True:
refList.append(ref)
refList.append(code)
refList.append(shares)
I hope this helps. Cheers!
Upvotes: 0