beautifulsoup can't find exist href in file

Question

I have a html file like following:



next_page
 

 1/3

how to extract the "1/3" from the file?

It is a part of html,I intend to make it clear. When I use beautifulsoup,

I'm new to beautifulsoup,and I have look the document,but still confused.

how to extract"1/3" from the html file?

total_urls_num = re.findall('\d+/\d+',response)

work code:

from BeautifulSoup import BeautifulSoup
import re

with open("html.txt","r") as f:
    response = f.read()
    print response
    soup = BeautifulSoup(response)
    delete_urls = soup.findAll('a', href=re.compile('follow\?page'))   #works,should escape ?
    print delete_urls
    #total_urls_num = re.findall('\d+/\d+',response)   
    total_urls_num = soup.find('input',type='submit')   
    print total_urls_num

DSM · Accepted Answer

I think the problem is that the text you're searching for isn't the attribute of some tag, but comes after. You can access it using .next:

In [144]: soup.find("input", type="submit")
Out[144]: 

In [145]: soup.find("input", type="submit").next
Out[145]: u' 1/3
'

and you can then get the 1/3 from that however you like:

In [146]: re.findall('\d+/\d+', _)
Out[146]: [u'1/3']

or simply something like:

In [153]: soup.findAll("input", type="submit", text=re.compile("\d+/\d+"))
Out[153]: [u' 1/3
']

beautifulsoup can't find exist href in file

Answers (2)

Related Questions

beautifulsoup can&#39;t find exist href in file

Answers (2)

Related Questions

beautifulsoup can't find exist href in file