algoProg
algoProg

Reputation: 728

python: text file processing, for loop and read operation conflict

I am learning to code in Python. Now I am experimenting with a file comparison program from here.

My code is:

#!/usr/bin/python3

def main():
    fhand1 = open('mbox.txt')
    print('file handle for mbox is {}'.format(fhand1))
    count = 0
    for l1 in fhand1:
        count = count + 1
        l1 = l1.rstrip()  # Skip 'uninteresting lines'
        if l1.startswith('From:'):
            print('{}'.format(l1))
    print('Numer of lines: {}'.format(count))

    fhand2 = open('mbox-short.txt')
    #inp = fhand2.read(), when here for loop does not work
    #for l2 in fhand2:
        #if l2.startswith('From:'):
            #print('{}'.format(l2))
    inp = fhand2.read()#if for loop is active then this doesnot work
    print('Total characters in mbox-short: {}'.format(len(inp)))
    print('First 20 characters on mbox-short: {}'.format(inp[:56]))

if __name__ == "__main__": main()

My question is for 'mbox-short.txt'. When I put inp = fhand2.read() before the for l2 in fhand2: {} the for loop does not run. When I change the sequence, the read() operation does not work.

Can someone please explain this?

Btw, I am using JetBrains PyCharm Community Ed 4 IDE.

Thank you in advance.

Upvotes: 2

Views: 896

Answers (4)

Osvald Laurits
Osvald Laurits

Reputation: 1364

By calling .read() on a file object you empty it and therefore cant loop over its elements anymore. You can test this by calling read with the optional [size] argument. The size of mbox-short.txt is 94626. Calling read with 94625 reads the first 94625 bytes of your file into a string. You can than loop over the remaining 1 byte in the file object (which is the newline character \n). file.read([size]) reads the whole file into a string by default and therefore nothing to iterate over remains.

  filehandle = open("mbox-short.txt")
  fileString = filehandle.read(94625)
  print (len(fileString))
  count = 0
  for x in filehandle:
      print (x)
      count += 1
  print (count)

See: https://docs.python.org/2/library/stdtypes.html?highlight=read#file.read

(I can't find file.read() in python3 documentation, but I assume it hasn't changed over the versions)

Upvotes: 0

Yevgen
Yevgen

Reputation: 1657

What is happening here is the read operation returning the full contents of the file (thus placing the caret at the end of the file) by the time when you assign your variable, that is why you are receiving empty string.

You need either do this:

fhand2 = open('mbox-short.txt')
inp = fhand2.read() # uncomment the first read operation
for l2 in fhand2:
    if l2.startswith('From:'):
        print('{}'.format(l2))
# inp = fhand2.read() comment out the second one

or this:

fhand2 = open('mbox-short.txt')
inp = fhand2.read()
for l2 in fhand2:
    if l2.startswith('From:'):
        print('{}'.format(l2))
fhand2 = open('mbox-short.txt') # re-open the file you have already read
inp = fhand2.read()

See more information on the python i/o here.

Upvotes: 1

santhosh
santhosh

Reputation: 187

inp = fhand2.readlines() should fix your problem. FYI check this out How do I read a file line-by-line into a list?

Upvotes: 0

hyades
hyades

Reputation: 3160

The read() method will read the full file into a string. So if say your file looks like

1 2 3 4
5 6 7 8

This will return "1 2 3 4\n5 6 7 8\n". So when you say, for l2 in fhand2, it will loop across this string. Thus you are basically going through each and every element in the string. i.e 1, , 2 and so on.

If you want to read line by line, either use readline() which will fetch you the next line, or use readlines() which will fetch you a list like - ["1 2 3 4\n", "5 6 7 8\n"]

Upvotes: 0

Related Questions