Piers Thomas
Piers Thomas

Reputation: 327

while text not in soup: - not picking up that text is in soup even though it exists

Writing a script to check if a product is loaded onto a website.

import requests
import time
from bs4 import BeautifulSoup

r = requests.get('https://www.off---white.com/en/GB/section/new-arrivals.js')
soup = BeautifulSoup(r.text, 'html.parser')
text = '3.0'

while text not in soup:
    print('not found')
    r = requests.get('https://www.off---white.com/en/GB/section/new-arrivals.js')
    soup = BeautifulSoup(r.text, 'html.parser')
    time.sleep(5)

When I print soup I can see that '3.0' is in there. But when I run the script it does not recognize that '3.0' is there. What am I doing wrong?

Upvotes: 1

Views: 136

Answers (2)

Keyur Potdar
Keyur Potdar

Reputation: 7238

If you only want to check if the text is present in the source code, you don't need BeautifulSoup. You can directly check it using requests.

r = requests.get('https://www.off---white.com/en/GB/section/new-arrivals.js')
text = '3.0'

while text not in r.text:
    print('not found')
    r = requests.get('https://www.off---white.com/en/GB/section/new-arrivals.js')
    time.sleep(5)

If you need to use BeautifulSoup for any other reasons, you can use any one of the following:

  • while text not in soup.text
  • while text not in soup.get_text()
  • while text not in str(soup)

Now, if you are curious as to why while text not in soup isn't working, read the following:

The magic method that defines the behaviour of x in y is __contains__(self, item). If you look at the source code of BeautifulSoup.__contains__, it is given by:

def __contains__(self, x):
    return x in self.contents

So, by using while text not in soup, you are checking whether text is an item of list of elements (either Tag or NavigableString) returned by .contents. Since, 3.0 is some text inside a tag, it is not directly available in that list and hence '3.0' in soup returns False.


To check the source code, you can either go to the libraries installed on your PC and check the code, or use the following:

import inspect
from bs4 import BeautifulSoup

print(inspect.getsource(BeautifulSoup.__contains__))

Upvotes: 1

e11i0t23
e11i0t23

Reputation: 151

Hi i have 3 things for you to try:

1: make sure soup is a string by doing:

while text not in str(soup):

2: try rearranging the while loop to:

while not text in soup:

3: if soup is an array and not a string you can do:

while soup.index(text) == -1:

Upvotes: 0

Related Questions