Reputation: 63
There is a html file saved on my harddrive, and I need to extract the strings displayed on the html page and save them into a text file using python.
html representation with tags, etc:
Bme: 1 Port: 1<br />
Downstream line rate: 6736 kbps<br />
Upstream line rate: 964 kbps<br />
What I need to extract from above is the number after the
Downstream line rate:
in this case, 6736, and write this number to a file. How can this be achieved?
Upvotes: 1
Views: 489
Reputation: 610
BeautifulSoup is probably overkill for this. If all the "Downstream" lines are formatted like that, you can easily get those numbers with regular expressions.
>>> import re
>>> regex = r'Downstream line rate: (\d\d*) kbps<br />'
>>> re.search(regex, "Downstream line rate: 6736 kbps<br />").group(1)
'6736'
If all the lines aren't formatted exactly like that, you might have to make the regex more general. Possibly something like r'Downstream.*(\d\d*)'
.
Upvotes: 2