Reputation: 113
although I am seemingly not the first person who has had this problem, I was not able to find the answer to my problem.
I am scraping an HTML table and although I am trying to loop through it, I am only getting the first row from the table.
import requests
from bs4 import BeautifulSoup
# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList
wegoList = soup.find_all("tbody")
try:
for items in wegoList:
material = items.find("td", {"class": "click_whole_cell",}).get_text().strip()
cas = items.find("td", {"class": "text-center",}).get_text().strip()
category = items.find("div", {"class": "text-content short-text",}).get_text().strip()
print(material,cas,category)
except:
pass
the result for the first row is coming out correctly: (1,2-Dimethylimidazole 1739-84-0 Organic Intermediates, Plastic, Resin & Rubber, Coatings); however the for loop is not looping through the table.
Thank you for any help
Upvotes: 1
Views: 2758
Reputation: 12669
Try this code :
import requests
from bs4 import BeautifulSoup
# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList
wegoList = soup.find_all("tbody")
try:
for items in wegoList:
material = items.find_all("td", {"class": "click_whole_cell",})
for i in material:
print(i.get_text().strip())
cas = items.find_all("td", {"class": "text-center",})
for i in cas:
print(i.get_text().strip())
category = items.find_all("div", {"class": "text-content short-text",})
for i in category:
print(i.get_text().strip())
except:
pass
Updated code:
import requests
from bs4 import BeautifulSoup
# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList
wegoList = soup.find_all("tbody")
for items in wegoList:
material = items.find_all("td", {"class": "click_whole_cell",})
cas = items.find_all("td", {"class": "text-center",})
category = items.find_all("div", {"class": "text-content short-text",})
for i in zip(material,cas,category):
print(i[0].get_text().strip(),i[1].get_text().strip(),i[2].get_text().strip())
output:
1,2-Dimethylimidazole 1739-84-0 Organic Intermediates, Plastic, Resin & Rubber, Coatings
1,6-Hexanediol 629-11-8 Adhesives & Sealants, Industrial Chemicals, Inks & Digital Inks, Organic Intermediates, Plastic, Resin & Rubber, Coatings
2,2,4-Trimethyl-1,3-Pentanediol Monoisobutyrate 25265-77-4 Inks & Digital Inks, Oil Field Services, Organic Intermediates, Solvents & Degreasers, Coatings
2,6-Dichloroaniline 608-31-1 Agricultural Chemicals, Crop Protection, Organic Intermediates
Upvotes: 0
Reputation: 3844
for items in wegoList:
loops through list of tbody
then you try to extract attributes from entire table but you should loop through every tr
row:
wegoList = soup.find_all("tbody")
try:
soup=BeautifulSoup(wegoList.__str__(),"html.parser")
trs = soup.find_all('tr') #Makes list of rows
for tr in trs:
material = tr.find("td", {"class": "click_whole_cell",}).get_text().strip()
cas = tr.find("td", {"class": "text-center",}).get_text().strip()
category = tr.find("div", {"class": "text-content short-text",}).get_text().strip()
print(material,cas,category)
Upvotes: 1