Tech
Tech

Reputation: 33

How to Modify Subelement with Beautiful Soup in Python 3

I need to pull the src and href subelements out of img and a tags respectively, change the content, and have the changes save to the original file. I am using Python 3 and Beautiful Soup. For context, I need to be able to accomplice this on a series of files within a directory, so a simple find and replace would not have done the trick. Here is the code I have at present:

from bs4 import BeautifulSoup

with open("file.html") as fp:
    soup = BeautifulSoup(fp, "lxml")

atags = soup.find_all("a", href=True)
imgtags = soup.find_all("img", src=True)

for a in atags:
    link = a.get("href")
    if link.find("http"):
        link = link.split("/")[-1]

        tmp = link.replace("%20", " ")
        link = tmp

        link = link.split("?")[0]

        a.get("href").replace_with(link)

        print(a)

for img in imgtags:
    pic = img.get("src")
    pic = pic.split("/")[-1]

    tmp = pic.replace("%20", " ")
    pic = tmp

    pic = pic.split("?")[0]

    img.get("src").replace_with(pic)

    print(img)

with open("file.html", "wb") as f_output:
    f_output.write(soup.prettify("utf-8"))

How can I do this in a way that will actually save?

Upvotes: 0

Views: 129

Answers (1)

Tech
Tech

Reputation: 33

After further research, I was able to modify the subelements I needed by changing the lines

a.get("href").replace_with(link)
img.get("src").replace_with(pic)

with

a['href'] = link
img['src'] = pic

Upvotes: 2

Related Questions