Reputation: 23

Adjusting iteration amounts in a Python loop

I'm trying to create an algo which goes through a list of strings, joins strings together if they meet a certain criteria, then skips by the number of strings it joined to avoid double counting of sections of the same joined string.

I understand i = i + x or i += x doesnt change the amount each loop iterates by, so am looking for an alternative method to skip a number of iterations by a variable.

Background: Im trying to create a Named Entity recognition algo for use in news articles. I tokenise the text ('Prime Minister Jacinda Ardern is from New Zealand') into ('Prime','Minister','Jacinda','Ardern','is'...) and run the NLTK POS tagging algo over it giving : ...(('Jacinda','NNP'),('Ardern','NNP'),('is','VBZ')... then combine words when subsequent words are also 'NNP' /proper nouns.

The goal is to count 'Prime Minister Jacinda Ardern' as 1 string as opposed to 4, then to skip the loop iteration by as many words to avoid the next string being 'Minister Jacinda Ardern' and then 'Jacinda Ardern'.

Context: 'text' is a list of lists created by tokenising and then POS tagging my article and is in the format: [...('She', 'PRP'), ('said', 'VBD'), ('the', 'DT'), ('roughly', 'RB'), ('25-minute', 'JJ'), ('meeting', 'NN')...] 'NNP' = proper noun or the names of places/people/organisations etc.

for (i) in range(len(text)):

    print(i)

    #initialising wordcounter as a variable
    wordcounter = 0

    # if text[i] is a Proper Noun, make namedEnt = the word. 
    # then increase wordcounter by 1
    if text[i][1] == 'NNP':
        namedEnt = text[i][0]
        wordcounter +=1

        # while the next word in text is also a Proper Noun,
        # increase wordcounter by 1. Initialise J as = 1
        while text[i + wordcounter][1] == 'NNP':
            wordcounter +=1
            j = 1


            # While J is less than wordcounter, join text[i+j] to 
            # namedEnt. Increase J by 1. When that is no longer
            # the case append namedEnt to a namedEntity list
            while j < wordcounter:
                namedEnt = ' '.join([namedEnt,text[i+j][0]])
                j += 1
            InitialNamedEntity.append(namedEnt)

        i += wordcounter

If I print(i) at the start it goes up by 1 at a time. When I print the Counter of the NamedEntity list made up of namedEnts, i results as follows: (...'New Zealand': 7, 'Zealand': 7, 'United': 4, 'Prime Minister Minister Jacinda Minister Jacinda Ardern': 3...)

So im not only getting double counts as in 'New Zealand' and 'Zealand', but im also getting wacky results like 'Prime Minister Minister Jacinda Minister Jacinda Ardern'.

The results I would like would be ('New Zealand':7, 'United States':4,'Prime Minister Jacinda Ardern':3)

Any help would be greatly appreciated. Cheers

Upvotes: 1

Answers (3)

Ameth Rawat

Reputation: 23

Thanks for the help everyone. I used the while loop shown by Barmar:

i = 0

while i < len(text):

i += wordcounter

and at the end used an if else statement:

if wordcounter > 0: i += wordcounter

else: i += 1

Upvotes: 0

Michiel

Reputation: 117

range() creates an iterable object. The for...in construct calls a next method on it and each time next returns the next value in the sequence. So your i variable is not the index in that sequence, it's just the next value produced by the iterator. Modifying i has no effect, it will just be overwritten when the next value is retrieved from the sequence.

This is very different from a loop like for (int i = 0; i < 5; i++) {} in C, where there is no concept of a sequence; that just checks if i less than five before executing the block.

Compare it to this:

for i in {2,-1,-4}:
  print(i)
  i = i + 2

Perhaps here it is more obvious that setting i will have no effect.

But that C-like construct, you can do that in Python too. As follows:

i = 0
while i < 6:
  print(i)
  if i == 2:
    i = i + 2
  else:
    i = i + 1

This prints

See how it didn't output 3? When it got to i == 2, it added 2 so it skipped over 3. You can do something similar in your code.

(these examples were Python 3)

Upvotes: 1

Barmar

Reputation: 781380

Don't use a for loop if you need to adjust how i is incremented, as it always sets it to the next value in the range. Use a while loop:

i = 0
while i < len(text):
    ...
    i += wordcounter

Upvotes: 1

Adjusting iteration amounts in a Python loop

Answers (3)

Related Questions