user1144251
user1144251

Reputation: 347

How to parse a string in python

Without any 3rd party libraries (such as beautiful soup) what is the cleanest way to parse a string in PYTHON.

Given the text below I'd like the content of "uber_token" be parsed out ie. "123456789"

....

<form id="blah" action="/p-submi.html" method="post"><input type="hidden" id="" name="uber_token" value="123456789"/><div class="container-info">

....

Thanks!

Upvotes: 0

Views: 402

Answers (3)

omu_negru
omu_negru

Reputation: 4770

Python comes with it's own xml parsing module : https://docs.python.org/3.2/library/xml.html?highlight=xml#xml so you don't have to use any third party parsing lib. If you're unwilling or not allowed to use that..... you can always drop to regex , but i'd stay clear of that when it comes to parsing XML

Upvotes: 0

leewz
leewz

Reputation: 3346

Disclaimer: This answer is for quick-and-dirty scripts, and may lack in robustness and efficiency. Suggestions here should probably not be used for code that survives more than a few hours.

If you're unwilling to learn regex (and you should be willing to learn regex!), you can split on value=". Probably really inefficient, but simple is easier to debug.

values = []

with open('myfile.txt') as infile:
    for line in infile:
        candidates = line.split('value="')
        for s in candidates[1:]: #the first token is not a value
            try: #test if value is a number
                val = int(s.split('"')[0]) 
            except:
                continue
            values.append(val)

If you're specifically looking at HTML or XML, Python has libraries for both.

Then, for example, you can write code to search through the tree for a node with an attribute "name" that has value "uber_token", and get the "value" attribute from it.

Very dumb Python 2 example that doesn't require learning too much about ElementTrees (may need simple corrections):

import xml.etree.ElementTree as ET
tree = ET.parse('myfile.xml')
root = tree.getroot()

values = []

for element in root:
    if element.attrib['name'] == 'uber_token':
        values.append(element.attrib['value'])

Upvotes: 0

Suku
Suku

Reputation: 3880

regular expression is the solution.

use import re

>>> import re
>>> s = '<form id="blah" action="/p-submi.html" method="post"><input type="hidden" id="" name="uber_token" value="123456789"/><div class="container-info"'
>>> regex=re.search(r'name="uber_token" value="([0-9]+)"',s)
>>> print regex.group(1)
123456789

Upvotes: 2

Related Questions