Reputation: 711
I have HTML as :
<span id="lbldiv" class="lbl" style="color:Blue;">
Division : First; Grand Total: 3861; Grand Max Total: 4600
</span>
I can extract the text : Division : First; Grand Total: 3861; Grand Max Total: 4600
by using the get_text
on the span element.
Is it possible to extract just the numbers - 3861
and 4600
from the text or get the characters(numbers) by skipping the alphabets using Beautiful Soup library in Python?
Upvotes: 1
Views: 714
Reputation: 1738
If your data is regular, and by the looks of it, it's key-value pairs separated by semi-colons. The function below will extract that into key-value tuples. You could then go through and extract only rows where there are numbers using something like the below.
def extract_kv_pairs(s):
"""Extract key value pairs seperated by colons and semi-colons."""
kvp = []
for r in s.split(';'):
k, v = r.split(':')
# is it an integer?
try:
# yes, convert it
v = int(v)
except ValueError:
# no, trim the string
v = v.strip()
kvp.append((k.strip(), v))
return kvp
s = 'Division : First; Grand Total: 3861; Grand Max Total: 4600'
kvp = extract_kv_pairs(s)
numeric_values = [p for p in kvp if isinstance(p[1], int)]
print(kvp)
# [('Division', 'First'), ('Grand Total', 3861), ('Grand Max Total', 4600)]
print(numeric_values)
# [('Grand Total', 3861), ('Grand Max Total', 4600)]
Upvotes: 1