Reputation: 13
I'm having some trouble extracting some data from this string:
<input class="mail-address-address" id="mailAddress" readonly="readonly" type="text" value="THE_EMAIL_ADDRESS_HERE"/>).
How could I store the value=
into my own variable? I've thought about splitting but don't think you can split a whole word. Could you split at a certain char by .count()
method? Thank you, hopefully I could get some help on this.
Thanks
EDIT:
I'm trying to get the id by converting it to HTML since splinter did not seem to get the content in ID (it was just blank)
site = "https://10minutemail.com/10MinuteMail/index.html?dswid=9902"
req = urllib2.Request(site, headers=hdr)
page = urllib2.urlopen(req)
content = page.read()
soup = BeautifulSoup(content)
address-address" id="mailAddress" readonly="readonly">')
find = soup.find("class", {"id": "mailAddress"})
findId = soup.find(id="mailAddress")
the variable findId prints this:
<input class="mail-address-address" id="mailAddress" readonly="readonly" type="text" value="[email protected]"/>)
@Sidney
html_line= '''<input class="mail-address-address" id="mailAddress" readonly="readonly" type="text" value="[email protected]"/>)'''
input_value=html_line.split('value="',1)[1].rsplit('"',1)[0]
print(input_value)
This works fine except the domain name changes. ''' means I can't use my own variable (findId). Is there a work around for this?
Upvotes: 1
Views: 9669
Reputation: 60604
You should really use an html parser to parse html (not a regex or string manipulation). For example, you can use BeautifulSoup.
first, install the package:
pip install beautifulsoup4
then use it to to grab the value from your input tag:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
val = soup.input['value'] # val now contains the string 'THE_EMAIL_ADDRESS_HERE'
print(val)
Upvotes: 3
Reputation:
As @Daniel Roseman says, it would be nice to have some more context. Normally when parsing HTML you can use libraries like BeautifulSoup
. A good example for your case is Python beautifulsoup - getting input value.
If you want to code your own parser, or if you need something simple, you can even use split()
:
html_line='''<input class="mail-address-address" id="mailAddress" readonly="readonly" type="text" value="THE_EMAIL_ADDRESS_HERE"/>)'''
input_value=html_line.split('value="',1)[1].rsplit('"',1)[0]
I'd better advice you to use BeautifulSoup
(and if you wan't a simple parser, better use @sidney's answer)
Upvotes: 2
Reputation: 827
This would be pretty messy to handle using .split()
, so I would suggest using regular expressions (if you choose not to use HTML parsing libraries). To use regex, you need to import the re
module, and use the following regular expression, " +value=\"(.*?)\""
, like so:
import re
yourString = '<input class="mail-address-address" id="mailAddress" readonly="readonly" type="text" value="THE_EMAIL_ADDRESS_HERE"/>'
# m is the match object, containing data about the regex search.
m = re.search(" +value=\"(.*?)\"", yourString)
# To retrieve the content captured inside the parentheses inside the regex, look for saved matches.
value = m.group(1)
The regex searches for:
value="
, followed directly by,"
Upvotes: 2