Reputation: 327
Writing a script to check if a product is loaded onto a website.
import requests
import time
from bs4 import BeautifulSoup
r = requests.get('https://www.off---white.com/en/GB/section/new-arrivals.js')
soup = BeautifulSoup(r.text, 'html.parser')
text = '3.0'
while text not in soup:
print('not found')
r = requests.get('https://www.off---white.com/en/GB/section/new-arrivals.js')
soup = BeautifulSoup(r.text, 'html.parser')
time.sleep(5)
When I print soup I can see that '3.0' is in there. But when I run the script it does not recognize that '3.0' is there. What am I doing wrong?
Upvotes: 1
Views: 136
Reputation: 7238
If you only want to check if the text is present in the source code, you don't need BeautifulSoup
. You can directly check it using requests
.
r = requests.get('https://www.off---white.com/en/GB/section/new-arrivals.js')
text = '3.0'
while text not in r.text:
print('not found')
r = requests.get('https://www.off---white.com/en/GB/section/new-arrivals.js')
time.sleep(5)
If you need to use BeautifulSoup
for any other reasons, you can use any one of the following:
while text not in soup.text
while text not in soup.get_text()
while text not in str(soup)
Now, if you are curious as to why while text not in soup
isn't working, read the following:
The magic method that defines the behaviour of x in y
is __contains__(self, item)
. If you look at the source code of BeautifulSoup.__contains__
, it is given by:
def __contains__(self, x): return x in self.contents
So, by using while text not in soup
, you are checking whether text
is an item of list of elements (either Tag
or NavigableString
) returned by .contents
. Since, 3.0
is some text inside a tag, it is not directly available in that list and hence '3.0' in soup
returns False
.
To check the source code, you can either go to the libraries installed on your PC and check the code, or use the following:
import inspect
from bs4 import BeautifulSoup
print(inspect.getsource(BeautifulSoup.__contains__))
Upvotes: 1
Reputation: 151
Hi i have 3 things for you to try:
1: make sure soup is a string by doing:
while text not in str(soup):
2: try rearranging the while loop to:
while not text in soup:
3: if soup is an array and not a string you can do:
while soup.index(text) == -1:
Upvotes: 0