Reputation: 502
I'm looking for a way to replace from HTML file the "href" and "src" content of each line with other string. So, I need to replace this:
<img src="images/file.png" alt="">
With this:
<img src="..." alt="">
Actually I write a function that can convert files to base64. I need to search for href/src, take his file path content and replace it with the base64 version of the file, but I don't know how to.
Here the function to convert files to base64:
def filetoB64 (fpath=None,raw=False):
fstring = None
fmime = None
freturn = None
if fpath is not None:
if os.path.isfile(fpath):
fmime = mimetypes.MimeTypes().guess_type(fpath)[0]
if fmime in (filemimes['text'] + filemimes['image'] + filemimes['audio'] + filemimes['video']):
with open(fpath,'rb') as f:
fcontent = f.read()
fstring = base64.encodestring(fcontent).replace('\n','')
if raw:
freturn = fstring
else:
freturn = ''.join(['data:',fmime,';base64,',fstring])
else:
freturn = fpath
else:
freturn = fpath
return freturn
Upvotes: 0
Views: 66
Reputation: 9267
I'm assuming that your function to convert a file to base64
is fully working.
If you want to replace some tags in your HTML code, you can use regex
like this example:
import re
string = '<img src="images/file.png" href="http://wwww.linktoreplace.com", alt="">'
to_replace = re.findall('(\w+)="(.*?)"', string)
for k, v in to_replace:
if k == 'src':
string = re.sub(v, "src_replaced_by_this_string", string)
if k == 'href':
string = re.sub(v, "href_replaced_by_this_string", string)
print(string)
Output:
<img src="src_replaced_by_this_string" href="href_replaced_by_this_string", alt="">
Otherwise, you can use BeautifulSoup
which is a Python library for pulling data out of HTML and XML files.
Upvotes: 1