Reputation: 2511
I need to sample and extract only a specific string out of an offline HTML document and write that information nice and clean into a *.txt file.
So for example, lets assume that this is a section of the HTML file:
<span id="dataView01">001.00 SPL</span>
<span id="dataView02">543.00 SPL</span>
<span id="dataView03">056.00 SPL</span>
<span id="dataView04">228.00 SPL</span>
I need to get this as a result:
001.00 SPL
543.00 SPL
056.00 SPL
228.00 SPL
Could you please help me with this, Thanks.
Upvotes: 0
Views: 173
Reputation: 169304
Use an HTML parser like BeautifulSoup.
Example:
from bs4 import BeautifulSoup as bs
import re
markup = '''<span id="dataView01">001.00 SPL</span>
<span id="dataView02">543.00 SPL</span>
<span id="dataView03">056.00 SPL</span>
<span id="dataView04">228.00 SPL</span>'''
soup = bs(markup)
tags = soup.find_all('span', id=re.compile(r'[dataView]\d+'))
for t in tags:
print(t.text)
Result:
001.00 SPL 543.00 SPL 056.00 SPL 228.00 SPL
Next step; write to .txt file:
import csv
with open('output.txt','wb') as fou:
csv_writer = csv.writer(fou)
for tag in tags:
split_on_whitespace = t.text.split()
csv_writer.writerow(split_on_whitespace)
Upvotes: 3
Reputation: 1147
import re
s='001.00 SPL 543.00 SPL 056.00 SPL 228.00 SPL'
print re.search(r'(\d{3}\.\d{2}\sSPL\s\d{3}\.\d{2}\sSPL\s\d{3}\.\d{2}\sSPL\s\d{3}\.\d{2}\sSPL)',s).group()
I dont know the surrounding text in the html document but this might work.
I see your edit i will update mine
actually go with jldupont's answer.
Upvotes: 0