dartdog
dartdog

Reputation: 10862

Python If vs. While?

I have a small file read routine and I want only the 1st 200 records I have it working but along the way I could not figure out what was wrong with using the "while" construct. This code works:

import csv, sys, zipfile
sys.argv[0] = "/home/tom/Documents/REdata/AllListing1RES.zip"
zip_file    = zipfile.ZipFile(sys.argv[0])
items_file  = zip_file.open('AllListing1RES.txt', 'rU')
rows = []
for row_index, row in enumerate(csv.DictReader(items_file, dialect='excel', delimiter='\t')):
    if (row_index < 200):
        rows.append(row)
    else : break

This code runs until it fails with an out of memory condition I would have thought it was equivalent?

import csv, sys, zipfile
sys.argv[0] = "/home/tom/Documents/REdata/AllListing1RES.zip"
zip_file    = zipfile.ZipFile(sys.argv[0])
items_file  = zip_file.open('AllListing1RES.txt', 'rU')
rows = []
for row_index, row in enumerate(csv.DictReader(items_file, dialect='excel', delimiter='\t')):
    while (row_index < 200):
        rows.append(row)
    else : break

so what would be the right construct using while? –

Upvotes: 2

Views: 8596

Answers (5)

Matthis Kohli
Matthis Kohli

Reputation: 1995

So I was curious what would be faster and did a quick example in VB.NET. I don't know if the code I came up with has a logical errors but when doing only 100000 loops the while loops is faster.

When having large numbers of data the time difference is hugh.

Has nothing to do with the topic but somehow it fits the IF vs WHILE.

Public Class Form1
        Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim watch As New Stopwatch
        Dim i As Integer = 0

        For loops As Integer = 0 To 100000000

            watch.Start()
            If True Then
                i += 1
            End If
            watch.Stop()
        Next
        MessageBox.Show(watch.ElapsedMilliseconds) ' 2740
    End Sub
    Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
        Dim watch As New Stopwatch
        Dim loops As Integer = 0

        watch.Start()
        While loops < 100000000
            loops += 1
        End While
        watch.Stop()

        MessageBox.Show(watch.ElapsedMilliseconds) ' 300
    End Sub
End Class

Upvotes: 0

ncoghlan
ncoghlan

Reputation: 41486

The more traditional way of writing that loop would be:

for row_index, row in enumerate(csv.DictReader(items_file, dialect='excel', delimiter='\t')):
    if (row_index >= 200):
        break
    rows.append(row)

As soon as the row counter hits 200, we bail out of the loop.

To use a while loop instead of a for loop (note that, as a looping construct, while is an alternative to for rather than to if) it is necessary to step through the iterator manually:

itr = enumerate(csv.DictReader(items_file, dialect='excel', delimiter='\t'))
row_index = -1
while row_index < 199:
    try:
        row_index, row = next(itr) # Python 3. Use itr.next() in Python 2
    except StopIteration:
        break # Ran out of data
    rows.append(row)

All that said, there's actually a superior alternative to both of these options available in the itertools module:

from itertools import islice
itr = csv.DictReader(items_file, dialect='excel', delimiter='\t')
rows = list(islice(itr, 200))

Upvotes: 3

Mike Lewis
Mike Lewis

Reputation: 64137

They are not equivalent because in your while loop, it has the condition of row_index < 200, which will never be false because row_index will never change while you are in that loop.

This is why you are getting a memory conditional because you are probably running into an infinite loop.

You are essentially saying:

Psuedo Code:

stay in block one as long as row_index < 200:

block_one:
   rows.append(row)
   goto block_one

You can see that row_index will never change, thus you are going to be in block_one forever.

Whereas the if statement has the following psuedo code:

if row_index < 200 goto block_one otherwise break:

  block_one:
    rows.append(row)

You can see that block_one is not going back to itself, like you see in the while loop.

Upvotes: 6

SHKT
SHKT

Reputation: 178

They can't be equivalent because in your first code, only one loop is iterating (the for loop) which checks the if-else statements at each iteration of row_index. In your second code, the while loop is a nested loop in which the condition isn't being reached (since there is nothing iterating the row_index). that makes it go into an infinite loop, there by giving the memory error.

Upvotes: 0

vartec
vartec

Reputation: 134581

In the second case you're stuck forever in the while loop appending the same row over and over...

Upvotes: 0

Related Questions