LoopTurn
LoopTurn

Reputation: 107

python Get a link from string

I need to use a python script to take a email and fine a link from it and them open use that link to send a packet to a server that has that verification link inside of it so it verifies an account. How would I use python to take the

https://www.boomlings.com/database/accounts/activate.php?uid=8722046actcode=xLCReGjLdkWmINt1GY9e

out of

{'Sender': 'Geometry Dash', 'Subject': 'Please activate your account.', 'body': b'<style type="text/css">\n#google_translate_element{\n  float: right;\n  padding:0 0 10px 10px;\n}\n/* twitter do\xc4\x9frulama linki fix */\n.bulletproof-btn-1 a {\n  font-size: 20px!important;\n  color: #fff!important;\n  padding: 20px!important;\n  line-height: 33px!important;\n  text-decoration: none!important;\n}\n</style>\n<div id="google_translate_element"></div><script type="text/javascript">\nfunction googleTranslateElementInit() {\n  new google.translate.TranslateElement({pageLanguage: \'en\', layout: google.translate.TranslateElement.InlineLayout.SIMPLE, autoDisplay: false, multilanguagePage: true}, \'google_translate_element\');\n}\n</script><script type="text/javascript" src="//translate.google.com/translate_a/element.js?cb=googleTranslateElementInit"></script>\n\r\n\r\n<html>\r\n<head>\r\n\t<title></title>\r\n</head>\r\n<body>\r\n<p>Thank you for registering a Geometry Dash account</p>\r\n\r\n<p>Your account information:<br />\r\nUsername:&nbsp; SUKAFUTCUCK</p>\r\n\r\n<p>Please click the link below to activate your account:<br />\r\n<a href="http://www.boomlings.com/database/accounts/activate.php?uid=8722046&actcode=xlCReGjLdkWmINt1GY9e" target="_blank">Click\r\nHere</a></p>\r\n\r\n<p>Please contact [email protected] if you have any questions or\r\nneed assistance.</p>\r\n\r\n<p>If you did not send an account request using this email, then you\r\ncan safely disregard this message and nothing will happen.</p>\r\n\r\n<p>Regards,<br />\r\nRobTop Games</p>\r\n</body>\r\n</html>\r\n\r\n\r\n'}

The link will be different in different emails so I need something that can do this.

https://www.boomlings.com/database/accounts/activate.php?uid=*actcode=*

When the * means that string at any length can go there because it will be a different activate.php cod

Upvotes: 0

Views: 350

Answers (3)

G_M
G_M

Reputation: 3382

Assuming that dict from your description is now in a variable named d (it was just a bit long to put in here):

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(d['body'], 'lxml')
>>> link = soup.find('a', target='_blank')
>>> link['href']
'http://www.boomlings.com/database/accounts/activate.php?uid=8722046&actcode=xlCReGjLdkWmINt1GY9e'

BeautifulSoup docs

Upvotes: 1

sonus21
sonus21

Reputation: 5388

The email could in HTML or text format. If it's in HTML format then use libraries like bs4, pyquery etc.

If it's text then use regex to search the URL using the following regex

regex = ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

Refer: http://www.ietf.org/rfc/rfc3986.txt

Use re module to search the string as

import re
regex = r"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?"
urls = re.findall( regex, text )
print(urls)

Use pyquery module

from pyquery import pyQuery as pq
q = pq( text )
a_list = q( "a" )
urls = [ a.attr[ 'href' ] for a in a_list ]
print(urls)

EDIT:

Instead of using generic URL we can use specific URL, for example https?:\/\/www\.boomlings\.com\/database\/accounts\/activate\.php\?uid=.*&actcode=.*

https://ideone.com/NFj90L

Upvotes: 0

Mauricio Cortazar
Mauricio Cortazar

Reputation: 4213

You can use regex for that with something like:

import re
c = re.search("<a href=\".*?(?=\")", yourDict["body"].decode("utf-8"))
print(c.group())

but is much better if you find a package like parsel because you extract the html with xpath and not with regex, check this

EDIT

I use the regular expression because is the shortest and the fastest way with no need of download a package, but if your response changes drastically I recommend parsel for that. Example:

from parsel import Selector
sel = Selector(text=yourDict["body"].decode("utf-8"))
url = sel.xpath('//a[@target="_blank"]/@href').extract_first()

Upvotes: 2

Related Questions