Nate
Nate

Reputation: 3

Fixing Nested For Loops

I am having some trouble getting some nested 'for loops' to work the way I need them to. I've been searching to try and find an answer and this seems to happen quite often, but as I am still fairly new to Python myself, the explanations have not been too helpful. I have a list of words, and a file, that I am working with and would like for each word in the list go through the file line by line and if the line contains the word then print the line. Currently, when I run the code it only prints out the lines that contain the first word in the list, and does not continue on with the rest of words in the list.

Can you please offer some suggestions on how I can make this work?

Note: I know 'Efficiency' is spelled incorrectly, it's a problem with the data source.

EDIT: I need the lines to be grouped, so all the lines containing 'Speed' printed, then all the lines containing 'Acceleration' printed etc. All the lines in the file contain one of the words in SHEETS.

SHEETS = [' Speed',' Acceleration',' Engine Power',' Instantaneous Fuel Effeciency',
          ' Average Fuel Effeciency',' Instantaneous MPG',' Average MPG',
          ' MAF air flow rate',' Accelerator pedal position E',
          ' Commanded throttle actuator']


with open('userdata.log','r',encoding = 'utf-8') as my_file:
    for label in SHEETS:
        for line in my_file:
            if label in line:
                print (line)

Output:

2014-09-20 14:08:41.165, Speed, 0, mph

2014-09-20 14:08:43.742, Speed, 0, mph

2014-09-20 14:08:47.872, Speed, 0, mph

2014-09-20 14:08:49.490, Speed, 0, mph

2014-09-20 14:08:51.007, Speed, 0, mph

2014-09-20 14:08:52.456, Speed, 0, mph

2014-09-20 14:08:53.888, Speed, 0, mph

2014-09-20 14:08:55.499, Speed, 0, mph

2014-09-20 14:08:57.288, Speed, 0, mph

2014-09-20 14:08:57.838, Speed, 0, mph

2014-09-20 14:08:58.355, Speed, 0, mph

2014-09-20 14:08:58.572, Speed, 0, mph

Upvotes: 0

Views: 108

Answers (3)

Burhan Khalid
Burhan Khalid

Reputation: 174624

This happens because the first time this loop runs:

for label in SHEETS:
    for line in my_file:

It goes through the entire file, and then it stops (it doesn't "rewind" and start again from the top). So what its doing is taking the first word and searching the entire file ... and then since the file has already been searched (the line is at the last line), it doesn't find your other words.

The simple solution in your situation is to switch your logic: for each line in the file, see if it contains any of the words. This way you search each line once for all words (instead of the more inefficient one word in entire file).

The end result is the same - you will print any lines that contain the words you are after. The implementation is very simple, simply switch the order of your loops:

with open('userdata.log','r',encoding='utf-8') as my_file:
    for line in my_file:
        for label in SHEETS:
            if label in line:
                print(line)

I need the lines to be grouped, so all the lines containing 'Speed' printed, then all the lines containing 'Acceleration' printed etc. All the lines in the file contain one of the words in SHEETS.

Ah, this is something else. For this you need to use a dictionary, which is Python's key/value store container.

A dictionary is a place where you can store or group things and refer to them by a key.

In your situation, you want to group all the lines that match the words together, so your key would be the word, and the things would be a collection of lines. In the dictionary, each key would have a list as a value (a list is one of the many container types, another is a tuple).

lines_by_word = {}  # This is how you create an empty dictionary
with open('userdata.log', 'r', encoding='utf-8') as my_file:
   for line in my_file:
      for label in SHEETS:
          if label in line:
              # Now we have a match - next step is to
              # collect it. However, if this is the first time
              # we have encountered this word, we need to add it
              # to the dictionary
              if label not in lines_by_word:
                 # By default, dictionary return
                 # their keys in a "in" test (called a membership test)
                 # if the word doesn't exist, we need to create a blank
                 # list for it and add it to the dictionary
                 lines_by_word[label] = []

              lines_by_word[label].append(line) # Simply add the matching line
                                                # to the list for that word

for word,lines in lines_by_word.iteritems():
    print('There are total of {} lines for {}'.format(word, len(lines))
    for line in line:
        print(line)

Upvotes: 1

user2357112
user2357112

Reputation: 280648

Not everything in Python supports repeated iteration. Generally, there are two categories of iterables: iterators, which you can only iterate over once, and multi-use iterables, which can be iterated over as many times as you like. File objects fall into the first category.

If it's important that you get the specific result order you were expecting, you can reset the file position to the beginning after looping over it:

with open('userdata.log','r',encoding = 'utf-8') as my_file:
    for label in SHEETS:
        for line in my_file:
            if label in line:
                print (line)
        my_file.seek(0)

You might also consider exchanging the order of your loops and gathering lines into lists for each label before printing them at the end. This could run faster, due to less I/O:

labeled_lines = {label: [] for label in SHEETS}
with open('userdata.log','r',encoding = 'utf-8') as my_file:
    for line in my_file:
        for label in SHEETS:
            if label in line:
                labeled_lines[label].append(line)
                break
        else:
            # else on a loop means "if the loop didn't end with a break."
            raise SomeAppropriateException
for label in SHEETS:
    for lines in labeled_lines[label]:
        print(line)

Finally, the lines you'll be reading from the file will usually have line break characters at the end. (The only possible exception is the last line of a file.) Since print adds its own line break, this will result in an empty line after every line of output. You might want to strip line breaks to avoid this.

Upvotes: 0

Tony Suffolk 66
Tony Suffolk 66

Reputation: 9704

I think maybe you mean :

 SHEETS = [' Speed',' Acceleration',' Engine Power',' Instantaneous Fuel Effeciency',
      ' Average Fuel Effeciency',' Instantaneous MPG',' Average MPG',
      ' MAF air flow rate',' Accelerator pedal position E',
      ' Commanded throttle actuator']


with open('userdata.log','r',encoding = 'utf-8') as my_file:
     for line in my_file:
         for label in SHEETS:
            if label in line:
                print (line)

Nested loops go from outer to inner : for each line in the file, check if any of the labels exist in that line.

Upvotes: 0

Related Questions