Reputation: 107
I need to use a python script to take a email and fine a link from it and them open use that link to send a packet to a server that has that verification link inside of it so it verifies an account. How would I use python to take the
https://www.boomlings.com/database/accounts/activate.php?uid=8722046actcode=xLCReGjLdkWmINt1GY9e
out of
{'Sender': 'Geometry Dash', 'Subject': 'Please activate your account.', 'body': b'<style type="text/css">\n#google_translate_element{\n float: right;\n padding:0 0 10px 10px;\n}\n/* twitter do\xc4\x9frulama linki fix */\n.bulletproof-btn-1 a {\n font-size: 20px!important;\n color: #fff!important;\n padding: 20px!important;\n line-height: 33px!important;\n text-decoration: none!important;\n}\n</style>\n<div id="google_translate_element"></div><script type="text/javascript">\nfunction googleTranslateElementInit() {\n new google.translate.TranslateElement({pageLanguage: \'en\', layout: google.translate.TranslateElement.InlineLayout.SIMPLE, autoDisplay: false, multilanguagePage: true}, \'google_translate_element\');\n}\n</script><script type="text/javascript" src="//translate.google.com/translate_a/element.js?cb=googleTranslateElementInit"></script>\n\r\n\r\n<html>\r\n<head>\r\n\t<title></title>\r\n</head>\r\n<body>\r\n<p>Thank you for registering a Geometry Dash account</p>\r\n\r\n<p>Your account information:<br />\r\nUsername: SUKAFUTCUCK</p>\r\n\r\n<p>Please click the link below to activate your account:<br />\r\n<a href="http://www.boomlings.com/database/accounts/activate.php?uid=8722046&actcode=xlCReGjLdkWmINt1GY9e" target="_blank">Click\r\nHere</a></p>\r\n\r\n<p>Please contact [email protected] if you have any questions or\r\nneed assistance.</p>\r\n\r\n<p>If you did not send an account request using this email, then you\r\ncan safely disregard this message and nothing will happen.</p>\r\n\r\n<p>Regards,<br />\r\nRobTop Games</p>\r\n</body>\r\n</html>\r\n\r\n\r\n'}
The link will be different in different emails so I need something that can do this.
https://www.boomlings.com/database/accounts/activate.php?uid=*actcode=*
When the * means that string at any length can go there because it will be a different activate.php cod
Upvotes: 0
Views: 350
Reputation: 3382
Assuming that dict from your description is now in a variable named d
(it was just a bit long to put in here):
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(d['body'], 'lxml')
>>> link = soup.find('a', target='_blank')
>>> link['href']
'http://www.boomlings.com/database/accounts/activate.php?uid=8722046&actcode=xlCReGjLdkWmINt1GY9e'
Upvotes: 1
Reputation: 5388
The email could in HTML or text format.
If it's in HTML format then use libraries like bs4
, pyquery
etc.
If it's text then use regex to search the URL using the following regex
regex = ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
Refer: http://www.ietf.org/rfc/rfc3986.txt
Use re module to search the string as
import re
regex = r"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?"
urls = re.findall( regex, text )
print(urls)
Use pyquery module
from pyquery import pyQuery as pq
q = pq( text )
a_list = q( "a" )
urls = [ a.attr[ 'href' ] for a in a_list ]
print(urls)
EDIT:
Instead of using generic URL we can use specific URL, for example https?:\/\/www\.boomlings\.com\/database\/accounts\/activate\.php\?uid=.*&actcode=.*
Upvotes: 0
Reputation: 4213
You can use regex for that with something like:
import re
c = re.search("<a href=\".*?(?=\")", yourDict["body"].decode("utf-8"))
print(c.group())
but is much better if you find a package like parsel because you extract the html with xpath and not with regex, check this
EDIT
I use the regular expression because is the shortest and the fastest way with no need of download a package, but if your response changes drastically I recommend parsel for that. Example:
from parsel import Selector
sel = Selector(text=yourDict["body"].decode("utf-8"))
url = sel.xpath('//a[@target="_blank"]/@href').extract_first()
Upvotes: 2