Reputation: 353
My problem is related to this answer.
I have following code:
import urllib.request
from bs4 import BeautifulSoup
time = 0
html = urllib.request.urlopen("https://www.kramerav.com/de/Product/VM-2N").read()
html2 = urllib.request.urlopen("https://www.kramerav.com/de/Product/SDIA-IN2-F16").read()
try:
div = str(BeautifulSoup(html).select("div.large-image")[0])
if(str(BeautifulSoup(html).select("div.large-image")[1]) != ""):
div += str(BeautifulSoup(html).select("div.large-image")[1])
time = time + 1
except IndexError:
div = ""
time = time + 1
finally:
print(str(time) + div)
The site of the variable html has 2 div-classes named "large-image". The site of the variable html2 only has 1. With html the program works as intended. But if I switch to html2 the variable div is going to be completely empty.
I would like to save the 1 div-class rather than saving nothing. How could I archieve this?
Upvotes: 1
Views: 831
Reputation: 142889
You can concatenate all items inside for
loop
all_divs = soup.select("div.large-image")
for item in all_divs:
div += str(item)
time += 1
or using join()
time = len(all_divs)
div = ''.join(str(item) for item in all_divs)
You can also write in file directly inside for
loop and you get to row
for item in all_divs:
csv_writer.writerow( [str(item).strip()] )
time += 1
Working example
import urllib.request
from bs4 import BeautifulSoup
import csv
div = ""
time = 0
f = open('output.csv', 'w')
csv_writer = csv.writer(f)
all_urls = [
"https://www.kramerav.com/de/Product/VM-2N",
"https://www.kramerav.com/de/Product/SDIA-IN2-F16",
]
for url in all_urls:
print('url:', url)
html = urllib.request.urlopen(url).read()
try:
soup = BeautifulSoup(html)
all_divs = soup.select("div.large-image")
for item in all_divs:
div += str(item)
time += 1
# or
time = len(all_divs)
div = ''.join(str(item) for item in all_divs)
# or
for item in all_divs:
#div += str(item)
#time += 1
csv_writer.writerow( [time, str(item).strip()] )
except IndexError as ex:
print('Error:', ex)
time += 1
finally:
print(time, div)
f.close()
Upvotes: 0
Reputation: 20540
the variable div is going to be completely empty.
That's because your error handler assigned it the empty string.
Please don't use subscripts, conditionals, and handlers in that way. It would be more natural to iterate over the results of select() with for
, building up a result list (or string).
Also, you should create soup = BeautifulSoup(html)
just once, as that can be a fairly expensive operation, since it carefully parses a potentially long web page. With that, you could build up a list of HTML fragments with:
images = [image
for image in soup.select('div.large-image')]
Or if for some reason you're not fond list comprehensions, you could equivalently write:
images = []
for image in soup.select('div.large-image'):
images.append(image)
and then get the required html with div = '\n'.join(images)
.
Upvotes: 1