casualcoder
casualcoder

Reputation: 561

How do i understand whether i am parsing the websites acurately?

I built this function to tell me whether there have been changes to the website. I'm not sure if it works as I have tried it on a few websites that have not changed and it has given me the wrong output. Where is the issue and is there an issue at all? This is the code:

I put the code into a function so that I could allow the user to input any site

userurl=input("Please enter a valid url")
def checksite(userurl):
    change=False
    import time

    import urllib.request

    import io

    u = urllib.request.urlopen(userurl)

    webContent1 = u.read()

    time.sleep(60)

    u = urllib.request.urlopen(userurl)

    webContent2 = u.read()

    if webContent1 == webContent2:
        print("Everything is normal")
    elif webContent1 !=webContent2:
        print("Warning, there has been a change to the webite!")
        change=True

    return change
checksite(userurl)

Upvotes: 1

Views: 67

Answers (3)

Raja G
Raja G

Reputation: 6633

I have tested your code and it works perfectly fine in a Python webserver.

I have started one with python -m http.server

and placed an index.html in the same directory with some content before starting the server.

and your code

import time
import urllib.request
import io

userurl='http://localhost:8000/index.html'

def checksite(userurl):
    change=False
    u = urllib.request.urlopen(userurl)

    webContent1 = u.read()
    print(webContent1)

    time.sleep(15)

    u = urllib.request.urlopen(userurl)
    webContent2 = u.read()
    print(webContent2)
    if webContent1 == webContent2:
        print("Everything is normal")
    elif webContent1 !=webContent2:
        print("Warning, there has been a change to the webite!")
        change=True
    return change

checksite(userurl)

and output

b'<html>\n\t<title> Hello </title>\n\t<body>\n\t\tTesting, Webcontent1 \n\t</body>\n\t</html>\n\n'
b'<html>\n\t<title> Hello </title>\n\t<body>\n\t\tTesting, Webcontent2\n\t</body>\n\t</html>\n\n'
Warning, there has been a change to the webite!
[Finished in 17.5s]

Your code is perfectly fine.

Upvotes: 1

Flari
Flari

Reputation: 312

to know if a website or a page has changed you need to have a backup of it somewhere, in your code it was like you were comparing the site to itself... anyways. i recomend using the requests library in addition to BS4 and try parsing it line by line comparing to the backup you have.

So while the code is working (aka: the site you have as backup is showing the same lines as the site on the web) it will have a variable true. if it has changed it breaks the loop and simply shows the line where the site has changed.

Upvotes: 0

beefoak
beefoak

Reputation: 146

Try making a small HTML Hello World page. Given that many websites have dynamic content that changes each time you access it (and might not necessarily be visible), that could lead to your "incorrect" results.

Upvotes: 1

Related Questions