Reputation: 347
Without any 3rd party libraries (such as beautiful soup) what is the cleanest way to parse a string in PYTHON.
Given the text below I'd like the content of "uber_token" be parsed out ie. "123456789"
....
<form id="blah" action="/p-submi.html" method="post"><input type="hidden" id="" name="uber_token" value="123456789"/><div class="container-info">
....
Thanks!
Upvotes: 0
Views: 402
Reputation: 4770
Python comes with it's own xml parsing module : https://docs.python.org/3.2/library/xml.html?highlight=xml#xml so you don't have to use any third party parsing lib. If you're unwilling or not allowed to use that..... you can always drop to regex , but i'd stay clear of that when it comes to parsing XML
Upvotes: 0
Reputation: 3346
Disclaimer: This answer is for quick-and-dirty scripts, and may lack in robustness and efficiency. Suggestions here should probably not be used for code that survives more than a few hours.
If you're unwilling to learn regex (and you should be willing to learn regex!), you can split on value="
. Probably really inefficient, but simple is easier to debug.
values = []
with open('myfile.txt') as infile:
for line in infile:
candidates = line.split('value="')
for s in candidates[1:]: #the first token is not a value
try: #test if value is a number
val = int(s.split('"')[0])
except:
continue
values.append(val)
If you're specifically looking at HTML or XML, Python has libraries for both.
HTMLParser
: https://docs.python.org/2/library/htmlparser.htmlElementTree
: https://docs.python.org/2/library/xml.etree.elementtree.htmlThen, for example, you can write code to search through the tree for a node with an attribute "name" that has value "uber_token", and get the "value" attribute from it.
Very dumb Python 2 example that doesn't require learning too much about ElementTree
s (may need simple corrections):
import xml.etree.ElementTree as ET
tree = ET.parse('myfile.xml')
root = tree.getroot()
values = []
for element in root:
if element.attrib['name'] == 'uber_token':
values.append(element.attrib['value'])
Upvotes: 0
Reputation: 3880
regular expression is the solution.
use import re
>>> import re
>>> s = '<form id="blah" action="/p-submi.html" method="post"><input type="hidden" id="" name="uber_token" value="123456789"/><div class="container-info"'
>>> regex=re.search(r'name="uber_token" value="([0-9]+)"',s)
>>> print regex.group(1)
123456789
Upvotes: 2