user2779846
user2779846

Reputation: 13

python 3.3 search for match in webpage results

The most recent version of my working script has been included at the bottom of the post. I am looking into how to wiki this. **

Good day, I have the following code, I am wondering how to search the results for a match? I will be trying to match two to three words. I have tried html2text, beautifulsoup, re.search, and several others. Wether ive not implemented the things ive tried correctly, or they just dont work.

import requests

s = requests.session()

url = 'http://company.name.com/donor/index.php'
values = {'username': '1234567',
          'password': '7654321'}

r = s.post(url, data=values)

# page which requires being logged in to view
url = "http://company.name.com/donor/donor.php"

# sending cookies as well
result = s.get(url)

Ive tried many different ways, just cant get it. I am wondering which module I will need to be working with? And will i need to change the form of data that "result" is in? One thing I havent tried is writing "result" to a text file. I guess I could do that, and then search for my matches in that file... Im just thinking there is a very simple way to do this.

thanks for any help or direction

Updated/Edited Script:

## Script will, login, navigate to correct page, search and match, then print and text/sms result.

import re
import urllib
import smtplib
import requests
from bs4 import BeautifulSoup

s = requests.session()

url = 'http://company.name.com/donor/index.php'
values = {'username': '123456',
          'password': '654321'}

r = s.post(url, data=values)

# Now you have logged in
url = "http://company.name.com/donor/donor.php"

# sending cookies as well
result = s.get(url)

print (result.headers)
print (result.text)

result2 = (result.text)
match1 = re.findall('FindMe', result2);    #we are trying to find "FindMe" in "result2"

if len(match1) == 1:                       #if we find a match 
   matchresult = ('Yes it matched')
   print (matchresult)
else:                                      #if we don't find a match
   matchresult = ('Houston we have a problem')
   print (matchresult)

# send text from gmail account portion of code starts here.

body = matchresult

body = "" + body + ""

headers = ["From: " + 'Senders Name',
           "Subject: " + 'Type Subject Information',
           "To: " + '[email protected]',  #phone number and cell carrier @address
           "MIME-Version: 1.0",
           "Content-Type: text/html"]
headers = "\r\n".join(headers)

session = smtplib.SMTP('smtp.gmail.com', '587')

session.ehlo()
session.starttls()
session.ehlo
session.login('[email protected]', 'passwordforemailaddress')

session.sendmail('senders name', '[email protected]', headers + "\r\n\r\n" + body)
session.quit()

Upvotes: 0

Views: 4335

Answers (1)

tobias_k
tobias_k

Reputation: 82899

Still not sure whether I understood the question correctly, but based on the additional information from your comment, it should suffice to do something like this:

import urllib2
page = urllib2.urlopen("http://your.url.com")
content = page.read()
if "congratulations" in content:
    print ...
if "We're sorry" in content:
    print ...

As you are looking for very specific words, there is no need to use regular expressions to match some more general pattern, or a HTML parser to look into the structure of the document. Just see whether the string is in the document.

Upvotes: 1

Related Questions