Reputation: 169
There are multiple classes who all share the name "row", within each row class, there are multiple classes who all share the name "column".
I am trying to iterate through the row class, gathering only the first column of each row.
I am then printing out the link contents of that data
What is the correct way to do this? I have tried making a list, but after creating the list, I am no longer able to use the beautifulsoup functions on the object.
This is the link to the url :
rows = soup.find_all('div', attrs={'class': 'row'})
for row in rows:
col = row.find('div', attrs={'class': 'column'})
link = col.find('a')
print link.contents
Upvotes: 0
Views: 2758
Reputation: 9430
It looks like you need a cookie set before you can see content on the sub category page. So if I understand the question rght:
import requests
from bs4 import BeautifulSoup
# You need to store cookies so use a session.
s = requests.Session()
# Reques a page to get cookie.
s.get("https://www.theherbarium.com/products/?category=Essential%20Oils%20And%20Accessories")
# Make the real request.
page = s.get("https://www.theherbarium.com/products/?category=Essential%20Oils%20And%20Accessories&subcategory=Superior%20Quality%20Essential%20Oils")
soup = BeautifulSoup(page.content,'html.parser')
# Get the div.
divs = soup.find_all('div', attrs={'class': 'col-sm-4 column-spacer'})
# Get the a element text.
for div in divs:
print (div.find('a').text)
Outputs:
Balsam Fir 15 ml
Balsam Fir 30 ml
Balsam Fir 5 ml
Basil Essential Oil 15ml
Basil Essential Oil 30ml
Basil Essential Oil 3ml
Basil Essential Oil 5ml
Bergamot Essential Oil 15ml
...
If you just want unique names strip the size off with a regex and add to a set:
import requests
from bs4 import BeautifulSoup
import re
# You need to store cookies so use a session.
s = requests.Session()
# Reques a page to get cookie.
s.get("https://www.theherbarium.com/products/?category=Essential%20Oils%20And%20Accessories")
# Make the real request.
page = s.get("https://www.theherbarium.com/products/?category=Essential%20Oils%20And%20Accessories&subcategory=Superior%20Quality%20Essential%20Oils")
soup = BeautifulSoup(page.content,'html.parser')
# Get the div.
divs = soup.find_all('div', attrs={'class': 'col-sm-4 column-spacer'})
# Get the a element text.
a = set()
for div in divs:
text = div.find('a').text
a.add(re.sub('\s*\d+\s*ml$', '', text))
print (a)
Outputs:
{'Lavender, Bulgarian Essential Oil', 'Thyme, White', 'Mandarin, Red Essential Oil', 'Pine Needle Essential Oil', 'Lemongrass Essential Oil', 'Fir Needle, Siberian', 'Spruce', 'Peppermint', 'Lime Essential Oil', 'Myrrh', 'Juniper Essential Oil', 'Petitgrain', 'Wintergreen', 'Lemon Essential Oil', 'Palmarosa', 'Balsam Fir', 'Chamomile, Roman', 'Cypress', 'Citronella', 'Rosemary', 'Lemon myrtle Essential Oil', 'Clary Sage', 'Cinnamon Bark', 'Frankincense', 'Tangerine', 'Cocoa, Absolute', 'Spearmint', 'Ravensara Essential Oil', 'Spike Lavender Essential Oil', 'Hyssop', 'Ylang Ylang', 'Basil Essential Oil', 'Bergamot Essential Oil', 'Fir Needle, Siberian1', 'Geranium Bourbon', 'Patchouli', 'Black Pepper Essential Oil', 'Fennel', 'Grapefruit Essential Oil', 'Eucalyptus', 'Carrot Seed Essential Oil', 'Chamomile, German', 'Vetiver', 'Tea Tree', 'Ginger', 'Marjoram, Sweet', 'Clove Bud'}
Upvotes: 1