ISH91
ISH91

Reputation: 41

Parse html value in python

I have following html:

<td>
   <input maxlen="1" name="db" size="1" type="text" value="25"/>
   <div style="display:inline-block;position:relative;top:6px;left:0px;width:20px;">
    <input class="p_b" name="ta" style="height:1em; width:1.5em;line-height:1em;padding:0px;margin:0px;border:0px;background-color:#f3f3f3" type="submit" value="▴"/>
    <input class="p_b" name="ta" style="height:1em; width:1.5em;line-height:1em;padding:0px;margin:0px;border:0px;background-color:#f3f3f3" type="submit" value="▾"/>
   </div>
   <span style="position:relative;top:8px">
    
   </span>
   <input maxlen="1" name="dc" size="1" type="text" value="0"/>
   <div style="display:inline-block;position:relative;top:6px;left:0px;width:20px;">
    <input class="p_b" name="tb" style="height:1em; width:1.5em;line-height:1em;padding:0px;margin:0px;border:0px;background-color:#f3f3f3" type="submit" value="▴"/>
    <input class="p_b" name="tb" style="height:1em; width:1.5em;line-height:1em;padding:0px;margin:0px;border:0px;background-color:#f3f3f3" type="submit" value="▾"/>
   </div>
  </td>

I need to extract both numbers from value="25" and value="0". I made a workaround like:

y = soup.findAll('input', {'type':'text'})
a = re.findall('(?<=value=")(\d*)',str(y))

But I think there is should be more direct way to do it via parser, can anyone help with it?

Upvotes: 4

Views: 67

Answers (1)

Parolla
Parolla

Reputation: 407

Try below code line to extract @value from each input node

values = [element['value'] for element in soup.findAll('input', {'type':'text'})]

P.S. Note that using regex for web-scraping is a very bad practice - there are enough web-scraping tools that can easily do this for you (for instance, BeautifulSoup and lxml can be used in Python)

Upvotes: 1

Related Questions