Reputation: 53
I've made a piece of code which works fine with print but fails when I made a function of it and tried to return it. Here's the original code:
import requests
from bs4 import BeautifulSoup
import wikipedia
source_code = requests.get('http://en.wikipedia.org/wiki/IBM')
plain_text = source_code.text
plain_text = plain_text[:plain_text.find('id="toc"')]
soup = BeautifulSoup(plain_text)
for div in soup.findAll('a'):
if div.parent.name == 'p':
href = div.get('href')
href = href.replace(',', '')
href = href.replace('-', ' ')
href = href.replace('(', '')
href = href.replace(')', '')
href = href.replace('_', ' ')
print (href[6:])
href = href.replace(' ', '_')
href = href.replace(' ^ ', '')
try:
print(wikipedia.summary(href[6:]))
except wikipedia.exceptions.DisambiguationError as e:
print (e.options)
which formats the text and gives me a title and the summary of a wikipedia page and all the summaries of the links in the original summary, which is exactly what I want. Unfortunately, this needs to be part of a bigger program and therefore i made a function (maybe I should do it in another way?) it looks like this:
import requests
from bs4 import BeautifulSoup
import wikipedia
source_code = requests.get('http://en.wikipedia.org/wiki/IBM')
plain_text = source_code.text
plain_text = plain_text[:plain_text.find('id="toc"')]
soup = BeautifulSoup(plain_text)
def ELS():
for div in soup.findAll('a'):
if div.parent.name == 'p':
href = div.get('href')
href = href.replace(',', '')
href = href.replace('-', ' ')
href = href.replace('(', '')
href = href.replace(')', '')
href = href.replace('_', ' ')
return href[6:]
href = href.replace(' ', '_')
href = href.replace(' ^ ', '')
try:
return wikipedia.summary(href[6:])
except wikipedia.exceptions.DisambiguationError as e:
return e.options
print (ELS())
but for some reason, it doesn't loop and just prints the first title and then breaks, maybe it's an easy problem and just something i've missed
Upvotes: 0
Views: 136
Reputation: 1121754
return
immediately exits the function.
Collect the information in a list and return that:
def ELS():
results = []
for div in soup.findAll('a'):
if div.parent.name == 'p':
href = div.get('href')
href = href.replace(',', '')
href = href.replace('-', ' ')
href = href.replace('(', '')
href = href.replace(')', '')
href = href.replace('_', ' ')
href = href.replace(' ', '_')
href = href.replace(' ^ ', '')
try:
results.append((href[6:], wikipedia.summary(href[6:])))
except wikipedia.exceptions.DisambiguationError as e:
results.append((href[6:], e.options))
return results
You can then loop over the results; each entry is a tuple with the processed href
value and the wikipedia.summary()
output or the exception e.options
attribute. This then lets you further reuse this information in other code.
Upvotes: 1
Reputation: 1410
You just replace print with return, and your function behaviour now has a problem, because the function ends its execution when the command return is called.
Try something like this:
def ELS():
output = []
for div in soup.findAll('a'):
if div.parent.name == 'p':
href = div.get('href')
href = href.replace(',', '')
href = href.replace('-', ' ')
href = href.replace('(', '')
href = href.replace(')', '')
href = href.replace('_', ' ')
output.append(href[6:])
href = href.replace(' ', '_')
href = href.replace(' ^ ', '')
try:
output.append(wikipedia.summary(href[6:]))
except wikipedia.exceptions.DisambiguationError as e:
output.append(e.options)
return "\n".join(output)
Upvotes: 1
Reputation: 1979
You're returning out of your function, therefore breaking the loop. You need to add your search results to a list or a dict and return it after your loop.
Upvotes: 0