Reputation: 149
I want to iterate over a beautifulsoup object that changes length based on the number of elements it finds matching the HTML tag.
driver.get('https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418')
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
recall_details = soup.find('table', class_ = 'table table-bordered table-condensed')
recalled_products = recall_details.find_all('td')
recalled_products
Output:
[<td>One Ocean</td>,
<td>Sliced Smoked Wild Sockeye Salmon</td>,
<td>300 g</td>,
<td>6 25984 00005 3</td>,
<td>11253</td>]
I want to iterate over each td element and append to a list like this:
brands = []
products = []
sizes = []
upcs = []
codes = []
brand = recalled_products[0].text
product = recalled_products[1].text
size = recalled_products[2].text
upc = recalled_products[3].text
code = recalled_products[4].text
brands.append(brand)
products.append(product)
sizes.append(size)
upcs.append(upc)
codes.append(code)
print(brands)
print(products)
print(sizes)
print(upcs)
print(codes)
Output:
['One Ocean']
['Sliced Smoked Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']
I tried the following code, but I am not getting the expected result. I need some kind of counter I think.
for i in range(len(recalled_products)):
brand = recalled_products[i].text
product = recalled_products[i].text
size = recalled_products[i].text
upc = recalled_products[i].text
code = recalled_products[i].text
brands.append(brand)
products.append(product)
sizes.append(size)
upcs.append(upc)
codes.append(code)
print(brands)
print(products)
print(sizes)
print(upcs)
print(codes)
```
Output:
```
['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
['One Ocean', 'Sliced Smoked Wild Sockeye Salmon', '300\xa0g', '6\xa025984\xa000005\xa03', '11253']
This is a sample html code of the website
Thank you in advance for any help provided.
Upvotes: 0
Views: 63
Reputation: 988
This is how I would grab the markup.
from bs4 import BeautifulSoup
import requests
URL = "https://www.inspection.gc.ca/food-recall-warnings-and-allergy-alerts/2021-02-10/eng/1613010591343/1613010596418"
brands = []
products = []
sizes = []
upcs = []
codes = []
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
recall_details = soup.find("table", class_="table table-bordered table-condensed")
body = recall_details.find("tbody")
rows = body.find_all("tr")
for row in rows:
data = row.find_all("td")
brands.append(data[0].text)
products.append(data[1].text)
sizes.append(data[2].text)
upcs.append(data[3].text)
codes.append(data[4].text)
prints
['One Ocean']
['Sliced Smoked Wild Sockeye Salmon']
['300\xa0g']
['6\xa025984\xa000005\xa03']
['11253']
I do think that a dict would be a better data structure than multiple lists, but of course that varies on your use case.
If you wanted to do that you could change the code like this:
recalled = []
...
for row in rows:
data = row.find_all("td")
item = {
"brand": data[0].text,
"products": data[1].text,
"sizes": data[2].text,
"upcs": data[3].text,
"codes": data[4].text,
}
recalled.append(item)
prints
[{'brand': 'One Ocean', 'products': 'Sliced Smoked Wild Sockeye Salmon', 'sizes': '300\xa0g', 'upcs': '6\xa025984\xa000005\xa03', 'codes': '11253'}]
Upvotes: 1
Reputation: 81
question on the data is the return from
recalled_products = recall_details.find_all('td')
A = [[<td>beef</td>,
<td>250g</td>,
<td>6 25984 00005 3</td>,
<td>11253</td>],
[<td>Salmon</td>,
<td>300 g</td>,
<td>6 25984 00005 3</td>,
<td>11253</td>]]
or
b = [<td>beef</td>,
<td>250g</td>,
<td>6 25984 00005 3</td>,
<td>11253</td>,
<td>Salmon</td>,
<td>300 g</td>,
<td>6 25984 00005 3</td>,
<td>11253</td>]
for A you want to use indexing a 2D array
for i in range(len(recalled_products)):
brand = recalled_products[i][0].text
product = recalled_products[i][1].text
for B you want to use a step in your iteration
for i in range(0,len(recalled_products),4):
brand = recalled_products[i].text
product = recalled_products[i+1].text
Upvotes: 2
Reputation: 576
This looks to me as if you need to build a spreadsheet to hold the data that you need to store. You can use the library called openpyxl to do this and then create columns for brands, products, sizes, upcs, codes. Then store the results from your beautifulsoup object into the spreadsheet.
Upvotes: 0