junihh
junihh

Reputation: 502

Get a piece of string and replace with other one?

I'm looking for a way to replace from HTML file the "href" and "src" content of each line with other string. So, I need to replace this:

<img src="images/file.png" alt="">

With this:

<img src="..." alt="">

Actually I write a function that can convert files to base64. I need to search for href/src, take his file path content and replace it with the base64 version of the file, but I don't know how to.

Here the function to convert files to base64:

def filetoB64 (fpath=None,raw=False):
    fstring = None
    fmime = None
    freturn = None

    if fpath is not None:
        if os.path.isfile(fpath):
            fmime = mimetypes.MimeTypes().guess_type(fpath)[0]

            if fmime in (filemimes['text'] + filemimes['image'] + filemimes['audio'] + filemimes['video']):
                with open(fpath,'rb') as f:
                    fcontent = f.read()
                    fstring = base64.encodestring(fcontent).replace('\n','')

                    if raw:
                        freturn = fstring
                    else:
                        freturn = ''.join(['data:',fmime,';base64,',fstring])
            else:
                freturn = fpath
        else:
            freturn = fpath

    return freturn

Upvotes: 0

Views: 66

Answers (1)

Chiheb Nexus
Chiheb Nexus

Reputation: 9267

I'm assuming that your function to convert a file to base64 is fully working.

If you want to replace some tags in your HTML code, you can use regex like this example:

import re

string = '<img src="images/file.png" href="http://wwww.linktoreplace.com", alt="">'

to_replace = re.findall('(\w+)="(.*?)"', string)

for k, v in to_replace:
    if k == 'src':
        string = re.sub(v, "src_replaced_by_this_string", string)
    if k == 'href':
        string = re.sub(v, "href_replaced_by_this_string", string)

print(string)

Output:

<img src="src_replaced_by_this_string" href="href_replaced_by_this_string", alt="">

Otherwise, you can use BeautifulSoup which is a Python library for pulling data out of HTML and XML files.

Upvotes: 1

Related Questions