Lucas
Lucas

Reputation: 521

Python BeautifulSoup Looping Through Table Data

Very new to Python here. I'm trying to capture some data from this page this page. I'm trying to get the item name and the item type captured in two lists. I can figure out how to join them into one table later. Any help would be great!

The lines of code work on their own but the loop doesn't work for me. This produces two lines of code successfully:

import urllib
import bs4 as bs

sauce = urllib.request.urlopen('https://us.diablo3.com/en/item/helm/').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

item_details =  soup.find('tbody')
print(item_details) 

item_name = item_details.find('div', class_='item-details').h3.a.text
print(item_name)

item_type = item_details.find('ul', class_='item-type').span.text
print(item_type)

This repeats the value of the first item_name over and over:

for div in soup.find_all('div', class_='item-details'):
    item_name = item_details.find('div', class_='item-details').h3.a.text
    print(item_name)
    item_type = item_details.find('ul', class_='item-type').span.text
    print(item_type)

This is the output:

Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
Veil of Steel
Magic Helm
...

Upvotes: 2

Views: 74

Answers (3)

Aakash Dusane
Aakash Dusane

Reputation: 398

This works:

sauce = urllib.request.urlopen('https://us.diablo3.com/en/item/helm/').read()
soup = bs.BeautifulSoup(sauce, 'lxml')

item_names = soup.find_all('div', class_='item-details')
for ele in item_names:
   print(ele.h3.a.text)

item_type = soup.find_all('ul', class_='item-type')
for ele in item_type:
    print(ele.span.text)

Why your code didn't work:

It looks like instead of iterating over all elements, your code kept fetching the same element (find_all for all elements).

Upvotes: 0

Andersson
Andersson

Reputation: 52685

You need to use find_all (returns list) instead of find (returns single element):

for i, j in zip(item_details.find_all('div', class_='item-details'), item_details.find_all('ul', class_='item-type')):
    print(i.h3.a.text, " - ", j.span.text)

The output is:

Veil of Steel  -  Magic Helm
Leoric's Crown  -  Legendary Helm
Harlequin Crest  -  Magic Helm
The Undead Crown  -  Magic Helm
...

or in more readable format:

names = item_details.find_all('div', class_='item-details')
types = item_details.find_all('ul', class_='item-type')

for name, type in zip(names, types):
    print(name.h3.a.text, " - ", type.span.text)

Upvotes: 2

Tobey
Tobey

Reputation: 1440

You can do this in one loop of the details section rather than saving them in different list and matched them up

item_details = []
for sections in soup.select('.item-details'):
    item_name = sections.select_one('h3[class*="subheader-"]').text.strip()  # partial match subheader-1, subheader-2, ....
    item_type = sections.select_one('ul[class="item-type"]').text.strip()
    item_details.append([item_name, item_type])

print(item_details)

output

[['Veil of Steel', 'Magic Helm'], ["Leoric's Crown", 'Legendary Helm'], ....

Upvotes: 1

Related Questions