Reputation: 10862
I have a small file read routine and I want only the 1st 200 records I have it working but along the way I could not figure out what was wrong with using the "while" construct. This code works:
import csv, sys, zipfile
sys.argv[0] = "/home/tom/Documents/REdata/AllListing1RES.zip"
zip_file = zipfile.ZipFile(sys.argv[0])
items_file = zip_file.open('AllListing1RES.txt', 'rU')
rows = []
for row_index, row in enumerate(csv.DictReader(items_file, dialect='excel', delimiter='\t')):
if (row_index < 200):
rows.append(row)
else : break
This code runs until it fails with an out of memory condition I would have thought it was equivalent?
import csv, sys, zipfile
sys.argv[0] = "/home/tom/Documents/REdata/AllListing1RES.zip"
zip_file = zipfile.ZipFile(sys.argv[0])
items_file = zip_file.open('AllListing1RES.txt', 'rU')
rows = []
for row_index, row in enumerate(csv.DictReader(items_file, dialect='excel', delimiter='\t')):
while (row_index < 200):
rows.append(row)
else : break
so what would be the right construct using while? –
Upvotes: 2
Views: 8596
Reputation: 1995
So I was curious what would be faster and did a quick example in VB.NET. I don't know if the code I came up with has a logical errors but when doing only 100000 loops the while loops is faster.
When having large numbers of data the time difference is hugh.
Has nothing to do with the topic but somehow it fits the IF vs WHILE.
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim watch As New Stopwatch
Dim i As Integer = 0
For loops As Integer = 0 To 100000000
watch.Start()
If True Then
i += 1
End If
watch.Stop()
Next
MessageBox.Show(watch.ElapsedMilliseconds) ' 2740
End Sub
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
Dim watch As New Stopwatch
Dim loops As Integer = 0
watch.Start()
While loops < 100000000
loops += 1
End While
watch.Stop()
MessageBox.Show(watch.ElapsedMilliseconds) ' 300
End Sub
End Class
Upvotes: 0
Reputation: 41486
The more traditional way of writing that loop would be:
for row_index, row in enumerate(csv.DictReader(items_file, dialect='excel', delimiter='\t')):
if (row_index >= 200):
break
rows.append(row)
As soon as the row counter hits 200, we bail out of the loop.
To use a while
loop instead of a for
loop (note that, as a looping construct, while
is an alternative to for
rather than to if
) it is necessary to step through the iterator manually:
itr = enumerate(csv.DictReader(items_file, dialect='excel', delimiter='\t'))
row_index = -1
while row_index < 199:
try:
row_index, row = next(itr) # Python 3. Use itr.next() in Python 2
except StopIteration:
break # Ran out of data
rows.append(row)
All that said, there's actually a superior alternative to both of these options available in the itertools
module:
from itertools import islice
itr = csv.DictReader(items_file, dialect='excel', delimiter='\t')
rows = list(islice(itr, 200))
Upvotes: 3
Reputation: 64137
They are not equivalent because in your while loop, it has the condition of row_index < 200
, which will never be false because row_index
will never change while you are in that loop.
This is why you are getting a memory conditional because you are probably running into an infinite loop.
You are essentially saying:
Psuedo Code:
stay in block one as long as row_index < 200:
block_one:
rows.append(row)
goto block_one
You can see that row_index will never change, thus you are going to be in block_one forever.
Whereas the if statement has the following psuedo code:
if row_index < 200 goto block_one otherwise break:
block_one:
rows.append(row)
You can see that block_one
is not going back to itself, like you see in the while loop.
Upvotes: 6
Reputation: 178
They can't be equivalent because in your first code, only one loop is iterating (the for loop) which checks the if-else statements at each iteration of row_index. In your second code, the while loop is a nested loop in which the condition isn't being reached (since there is nothing iterating the row_index). that makes it go into an infinite loop, there by giving the memory error.
Upvotes: 0
Reputation: 134581
In the second case you're stuck forever in the while
loop appending the same row over and over...
Upvotes: 0