henry434
henry434

Reputation: 107

While iterating a list, add the value and the next 2 values into a new list

I am currently making a program to scan a PDF file and look for the key word 'Ref'. Once this word is found I need to take the next two strings, 'code' and 'shares' and add them to a new list to be imported into Excel later.

I have written code to take the text from the PDF file and add it to a list. I then iterate through this list and look for the 'Ref' keyword. When the first one is found it adds it to the list no problem. However when it comes to the next, it adds the first instance of Ref (+the code and the shares) to the list again and not the next one in the PDF file...

Here is the code for adding the Ref + code + shares to the new list (python 3):

for word in wordList:
    match = 'false'

    if word == 'Ref':
        match = 'true'
        ref = word
        code = wordList[wordList.index(ref)+1]
        shares = wordList[wordList.index(ref)+2]

    if match == 'true':
        refList.append(ref)
        refList.append(code)
        refList.append(shares)

Here is the output:

['Ref', '1', '266','Ref', '1', '266','Ref', '1', '266','Ref', '1', '266','Ref', '1', '266','Ref', '1', '266']

As you can see its the same reference number each time... the correct output should be something like this:

['Ref', '1', '266','Ref', '2', '642','Ref', '3', '435','Ref', '4', '6763'] etc...

If anyone knows why it is always adding the first ref and code with every instance of 'Ref' in the wordList let me know! I am quite stuck! Thanks

Upvotes: 0

Views: 46

Answers (2)

Chris Doyle
Chris Doyle

Reputation: 12199

Your issue is that the call to the index method of wordlist will only return you the first instance it fines. I.E you will always get the first instance of "Ref". Instead a better approach is to use enumerate over the list which will give the index and value for each entry as you go, then you can just reference the index value to get the next two elements. below is code example.

data = """
this
Ref
1
266
that
hello
Ref
2
642"""

refList = []
wordList = [item.rstrip() for item in data.splitlines()]
for index, word in enumerate(wordList):
    match = 'false'

    if word == 'Ref':
        match = 'true'
        ref = word
        code = wordList[index+1]
        shares = wordList[index+2]

    if match == 'true':
        refList.append(ref)
        refList.append(code)
        refList.append(shares)
print(refList)

OUTPUT

['Ref', '1', '266', 'Ref', '2', '642']

you could also clean up and remove a lot of unneeded code and just write it as:

for index, word in enumerate(wordList):
    if word == 'Ref':
        refList += [word, wordList[index+1], wordList[index+2]]

Upvotes: 1

Nate Scholnick
Nate Scholnick

Reputation: 15

When you use the list.index(str) function, it returns the first occurrence of str. To fix this, iterate by index:

for i in range(len(wordList):
    match = False

    if word == 'Ref':
        match = True
        ref = wordList[i]
        code = wordList[i+1]
        shares = wordList[i+2]

    if match == True:
        refList.append(ref)
        refList.append(code)
        refList.append(shares)

I hope this helps. Cheers!

Upvotes: 0

Related Questions