Myzel394
Myzel394

Reputation: 1317

Django - Create Images related on Article's HTML

I have an Article model with html as the text and I have an Image model with a field article = ForeignKey(Article). If there are images added to html of Article, they should be extracted and added as objects to the Image model. I have written my function create_images_from_tags to search for img tags using Beautifulsoup and save them.

Unfortunately, this doesn't work and I get this error:

ValueError: save() prohibited to prevent data loss due to unsaved related object 'Article'.

Here's my save function of my Article model:

def save(self, *args, **kwargs):
    self.html = self.create_images_from_tags(self.html)

    return super().save(*args, **kwargs)

Placing the function after super.save() will end up in an endless loop, because I'd have to save the model after it again.

EDIT

my create_images_from_tags function:

def create_images_from_tags(self, html: str) -> str:
    """
        Creates Image objects from the HTML. Searches using Beautifulsoup for img (HTML element).
        First checks if the image already exists (using 'data-image-id' attribute on the img-tag)
            If not, an Image object will be created and the id will be saved on the tag using the `data-image-id`
            attribute.
            The img-tag 'loading' attribute will be changed to 'lazy'.

            The following attributes of the img-tags will be extracted and saved on the Image object:
                alt -> description
                data-name -> name
                src -> path (will be created to an absolute path using the BASE_DIR from the settings)
            The following static values will be saved on the Image object:
                Article -> self (the current article)
                reduced_information -> True
                from_article -> True

    :param html: Old HTML of the article
    :return: New HTML
    """
    soup = BeautifulSoup(html, "html.parser")

    for element in soup.find_all("img"):
        image_id = element.get("data-image-id", None)

        try:
            Image.objects.get(id=image_id)
        except ObjectDoesNotExist:
            src = element["src"]
            description = element.get("alt", " ")
            name = str(element.get(
                "data-name",
                escape(f"Ein Bild vom Artikel \"{self.short_title}\"")
            ))
            # If src is relative, make full path
            if src.startswith("/"):
                path = os.path.join(settings.BASE_DIR, src[1:]).replace("\\", "/")
            else:
                path = src

            image = Image.objects.create(
                description=description,
                name=name,

                _original=path,
                Article=self,

                reduced_information=True,
                from_article=True,
            )

            element["data-image-id"] = image.id
            element["loading"] = "lazy"
    return str(soup)

Upvotes: 0

Views: 97

Answers (2)

Nalin Dobhal
Nalin Dobhal

Reputation: 2342

You can create a post_save signal. Process the html in your method and save it.

def save(self, *args, **kwargs):
    self.html = self.create_images_from_tags(self.html)
    return super().save(*args, **kwargs)

create_images_from_tags method

def create_images_from_tags(self, html: str) -> str:
   # do the processing etc to get the new html

post_save signal

# import your article model.
from bs4 import BeautifulSoup
from django.db.models.signals import post_save
from django.dispatch import receiver

@receiver(post_save, sender=Article)
def create_article_images(sender, instance, **kwargs):
 soup = BeautifulSoup(instance.html, "html.parser")

 for element in soup.find_all("img"):
     image_id = element.get("data-image-id", None)
     try:
        Image.objects.get(id=image_id)
     except ObjectDoesNotExist:
        src = element["src"]
        description = element.get("alt", " ")
        name = str(element.get(
             "data-name",
                escape(f"Ein Bild vom Artikel \"{self.short_title}\"")))
            # If src is relative, make full path
        if src.startswith("/"):
            path = os.path.join(settings.BASE_DIR, src[1:]).replace("\\", "/")
        else:
            path = src
    image = Image.objects.create(
                description=description, name=name, Article=instance,
                reduced_information=True, from_article=True,
            )
    # rest of the function ....

Follow this tutorial for detail information on How to Create Django Signals

Upvotes: 1

Myzel394
Myzel394

Reputation: 1317

SOLUTION

I created a method to check whether the images of the article were already created. That's my new code now:

def save(self, *args, **kwargs):
    # If not images added yet, add them
    if not self.check_images_already_created(self.html):
        self.html = self.create_images_from_tags(self.html)
        self.save(*args, **kwargs)

    return super().save(*args, **kwargs)

def check_images_already_created(self, html: str) -> bool:
    """
        Checks whether the images of the article's HTML were already created.
    :param html: HTML of article
    :return: Whether the images were already created
    """

    # Gets ids of all images
    images_ids = set(self.images.filter(from_article=True).values_list("id", flat=True).distinct())
    found_ids = set()

    if images_ids:
        soup = BeautifulSoup(html, "html.parser")

        for img in soup.find_all("img"):
            if img.has_attr("data-image-id"):
                image_id = img["data-image-id"]

                if image_id in images_ids:
                    found_ids.add(image_id)

        if images_ids == found_ids:
            return True
    return False

Maybe I'll help someone with this.

Upvotes: 0

Related Questions