lpt
lpt

Reputation: 975

`return` exits the function after the first iteration of the loop

I know I am missing something really small concept here.

Here is what I am trying to do: - Return all titles in the file with "*.html" extensions in the directory.

However, the function I wrote generated only first files title. But if I use "print" it prints all.

def titles():
    for file_name in glob.glob(os.path.join(dir_path, "*.html")):
        with open(file_name) as html_file:
            soup = BeautifulSoup(html_file)
            return str(soup.title.get_text().strip())
titles()

Upvotes: 0

Views: 68

Answers (2)

Chris Johnson
Chris Johnson

Reputation: 21956

You have two choices. Either add each result to a local data structure (say, a list) in the loop and return the list after the loop; or create this function to be a generator and yield on each result in the loop (no return).

The return approach is ok for smaller data sets. The generator approach is more friendly or even necessary for larger data sets.

Upvotes: 1

cs95
cs95

Reputation: 402443

Return exits within the function, giving you only the result of the first iteration. Once the function returns, control is passed back to the caller. It does not resume.

As a solution, you have 2 options.

Option 1 (recommended for a large amount of data): Change return to yield. Using yield converts your function into a generator from which you can loop across its return values:

def titles():
    for file_name in glob.glob(os.path.join(dir_path, "*.html")):
        with open(file_name) as html_file:
            soup = BeautifulSoup(html_file)

        yield soup.title.get_text().strip() # yield inside the loop, happens multiple times

for s in titles():
    print(s)

Option 2: Store all your output in a list and return the list at the end:

def titles():
    data = []
    for file_name in glob.glob(os.path.join(dir_path, "*.html")):
        with open(file_name) as html_file:
            soup = BeautifulSoup(html_file)
        data.append(soup.title.get_text().strip())

    return data # return outside the loop, happens once

print(titles())

Upvotes: 2

Related Questions