Alejandro L
Alejandro L

Reputation: 149

Iterate over a changing list length and append to another list

I want to iterate over a beautifulsoup object that changes length based on the number of elements it finds matching the HTML tag.

driver.get('https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418')
page_source = driver.page_source

soup = BeautifulSoup(page_source, 'html.parser')
recall_details = soup.find('table', class_ = 'table table-bordered table-condensed')

recalled_products = recall_details.find_all('td')
recalled_products

Output:

[<td>One Ocean</td>,
 <td>Sliced Smoked  Wild Sockeye Salmon</td>,
 <td>300 g</td>,
 <td>6 25984 00005 3</td>,
 <td>11253</td>]

I want to iterate over each td element and append to a list like this:

brands = []
products = []
sizes = []
upcs = []
codes = []

brand = recalled_products[0].text
product = recalled_products[1].text
size = recalled_products[2].text
upc = recalled_products[3].text
code = recalled_products[4].text
brands.append(brand)
products.append(product)
sizes.append(size)
upcs.append(upc)
codes.append(code)

print(brands)
print(products)
print(sizes)
print(upcs)
print(codes)

Output:

['One Ocean']
['Sliced Smoked  Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']

I tried the following code, but I am not getting the expected result. I need some kind of counter I think.

for i in range(len(recalled_products)):
    brand = recalled_products[i].text
    product = recalled_products[i].text
    size = recalled_products[i].text
    upc = recalled_products[i].text
    code = recalled_products[i].text
    brands.append(brand)
    products.append(product)
    sizes.append(size)
    upcs.append(upc)
    codes.append(code)

print(brands)
print(products)
print(sizes)
print(upcs)
print(codes)
```

Output:

```
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked  Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']

This is a sample html code of the website enter image description here

Thank you in advance for any help provided.

Upvotes: 0

Views: 63

Answers (3)

im_baby
im_baby

Reputation: 988

This is how I would grab the markup.

from bs4 import BeautifulSoup
import requests

URL = "https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418"

brands = []
products = []
sizes = []
upcs = []
codes = []

page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

recall_details = soup.find("table", class_="table table-bordered table-condensed")

body = recall_details.find("tbody")

rows = body.find_all("tr")

for row in rows:
    data = row.find_all("td")
    brands.append(data[0].text)
    products.append(data[1].text)
    sizes.append(data[2].text)
    upcs.append(data[3].text)
    codes.append(data[4].text)

prints

['One Ocean']
['Sliced Smoked  Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']

I do think that a dict would be a better data structure than multiple lists, but of course that varies on your use case.

If you wanted to do that you could change the code like this:


recalled = []

...

for row in rows:
    data = row.find_all("td")
    item = {
        "brand": data[0].text,
        "products": data[1].text,
        "sizes": data[2].text,
        "upcs": data[3].text,
        "codes": data[4].text,
    }
    recalled.append(item)

prints

[{'brand': 'One Ocean', 'products': 'Sliced Smoked  Wild Sockeye Salmon', 'sizes': '300\xa0g', 'upcs': '6\xa025984\xa000005\xa03', 'codes': '11253'}]

Upvotes: 1

IGotThis
IGotThis

Reputation: 81

question on the data is the return from

recalled_products = recall_details.find_all('td') 

A = [[<td>beef</td>,
     <td>250g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>],
     [<td>Salmon</td>,
     <td>300 g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>]]

or

b = [<td>beef</td>,
     <td>250g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>,
     <td>Salmon</td>,
     <td>300 g</td>,
     <td>6 25984 00005 3</td>,
     <td>11253</td>]

for A you want to use indexing a 2D array

for i in range(len(recalled_products)):
    brand = recalled_products[i][0].text
    product = recalled_products[i][1].text

for B you want to use a step in your iteration

    for i in range(0,len(recalled_products),4):
      brand = recalled_products[i].text
      product = recalled_products[i+1].text

Upvotes: 2

jaesle
jaesle

Reputation: 576

This looks to me as if you need to build a spreadsheet to hold the data that you need to store. You can use the library called openpyxl to do this and then create columns for brands, products, sizes, upcs, codes. Then store the results from your beautifulsoup object into the spreadsheet.

Upvotes: 0

Related Questions