dasJulian
dasJulian

Reputation: 577

Reading Files: One single function works, but both don't

Here is my code:

import re


def get_email_answers(path):
    for line in path:
        clear = line.strip()
        if re.match(r".*\s.*\t(Antw.+)\t.*Uhr", clear):
            subject = re.findall(r".*\s.*\t(Antw.+)\t.*Uhr", clear)
            print(subject)


def get_sizes(path):
    for line in path:
        clear = line.strip()
        if re.match(r".*\s([0-9][0-9]\s[MKG]B)", clear):
            size = re.findall(r".*([0-9][0-9]\s[MKG]B)", clear)
            print(size)
        elif re.match(r".*\s([0-9][0-9][0-9]\s[MKG]B)", clear):
            size = re.findall(r".*([0-9][0-9][0-9]\s[MKG]B)", clear)
            print(size)
        elif re.match(r".*\s([0-9]\s[MKG]B)", clear):
            size = re.findall(r".*([0-9]\s[MKG]B)", clear)
            print(size)
        elif re.match(r".*(.\.[0-9][0-9]\s[MKG]B)", clear):
            size = re.findall(r".*(.\.[0-9][0-9]\s[MKG]B)", clear)
            print(size)


file_opener = open(r"C:\Users\julia\Documents\RegEX-Test.txt", "r")
get_sizes(file_opener)
get_email_answers(file_opener)

The function get_sizes works, but the function get_email_answers doesn't. If you comment the function get_sizes out, then get_email_answers works perfectly. If you put get_email_answers before get_sizes, then get_sizes doesn't work and get_email_answers does.

I have done this:

def get_email_answers(path):
    print(path) #modified here
    for line in path:
        print("line") #and here
        clear = line.strip()
        if re.match(r".*\s.*\t(Antw.+)\t.*Uhr", clear):
            subject = re.findall(r".*\s.*\t(Antw.+)\t.*Uhr", clear)
            print(subject)

The printed path is the same as in get_sizes. But, the for-loop didn't run! Why? And why does it, when you comment the other function get_sizes out?

Upvotes: 1

Views: 81

Answers (2)

tyrrr
tyrrr

Reputation: 538

Reading files is a sequential process. When you open a file, an internal "pointer" is created, rembering where in the file you are - at first it points to the beginning of file, and each time you read a chunk of it, the "pointer"moves past this chunk and points to the first byte that hasn't been read yet. So, after one of your functions reads the file, this pointer is set to the end of it, and when second function tries to read it, it seems empty. You need to reset this pointer between readings, invoking file_opener.seek(0).

Btw. file_opener is a slightly confusing name - this variable holds the file object itself, not some object offering functionality to open a file.

Upvotes: 2

Exelian
Exelian

Reputation: 5888

You can only read a file-object once. I have to stored the file data in a variable: data = file_opener.read() and iterate over that or you need to return the file pointer at the end of a function.

Try this:

get_sizes(file_opener)
file_opener.seek(0)
get_email_answers(file_opener)

To clarify: the issue isn't in the functions, it's in the way your handle the input file.

Upvotes: 1

Related Questions