Reputation: 1317
I have an Article
model with html
as the text and I have an Image
model with a field article = ForeignKey(Article)
. If there are images added to html
of Article
, they should be extracted and added as objects to the Image
model. I have written my function create_images_from_tags
to search for img
tags using Beautifulsoup and save them.
Unfortunately, this doesn't work and I get this error:
ValueError: save() prohibited to prevent data loss due to unsaved related object 'Article'.
Here's my save
function of my Article
model:
def save(self, *args, **kwargs):
self.html = self.create_images_from_tags(self.html)
return super().save(*args, **kwargs)
Placing the function after super.save()
will end up in an endless loop, because I'd have to save the model after it again.
EDIT
my create_images_from_tags
function:
def create_images_from_tags(self, html: str) -> str:
"""
Creates Image objects from the HTML. Searches using Beautifulsoup for img (HTML element).
First checks if the image already exists (using 'data-image-id' attribute on the img-tag)
If not, an Image object will be created and the id will be saved on the tag using the `data-image-id`
attribute.
The img-tag 'loading' attribute will be changed to 'lazy'.
The following attributes of the img-tags will be extracted and saved on the Image object:
alt -> description
data-name -> name
src -> path (will be created to an absolute path using the BASE_DIR from the settings)
The following static values will be saved on the Image object:
Article -> self (the current article)
reduced_information -> True
from_article -> True
:param html: Old HTML of the article
:return: New HTML
"""
soup = BeautifulSoup(html, "html.parser")
for element in soup.find_all("img"):
image_id = element.get("data-image-id", None)
try:
Image.objects.get(id=image_id)
except ObjectDoesNotExist:
src = element["src"]
description = element.get("alt", " ")
name = str(element.get(
"data-name",
escape(f"Ein Bild vom Artikel \"{self.short_title}\"")
))
# If src is relative, make full path
if src.startswith("/"):
path = os.path.join(settings.BASE_DIR, src[1:]).replace("\\", "/")
else:
path = src
image = Image.objects.create(
description=description,
name=name,
_original=path,
Article=self,
reduced_information=True,
from_article=True,
)
element["data-image-id"] = image.id
element["loading"] = "lazy"
return str(soup)
Upvotes: 0
Views: 97
Reputation: 2342
You can create a post_save signal. Process the html in your method and save it.
def save(self, *args, **kwargs):
self.html = self.create_images_from_tags(self.html)
return super().save(*args, **kwargs)
create_images_from_tags method
def create_images_from_tags(self, html: str) -> str:
# do the processing etc to get the new html
post_save signal
# import your article model.
from bs4 import BeautifulSoup
from django.db.models.signals import post_save
from django.dispatch import receiver
@receiver(post_save, sender=Article)
def create_article_images(sender, instance, **kwargs):
soup = BeautifulSoup(instance.html, "html.parser")
for element in soup.find_all("img"):
image_id = element.get("data-image-id", None)
try:
Image.objects.get(id=image_id)
except ObjectDoesNotExist:
src = element["src"]
description = element.get("alt", " ")
name = str(element.get(
"data-name",
escape(f"Ein Bild vom Artikel \"{self.short_title}\"")))
# If src is relative, make full path
if src.startswith("/"):
path = os.path.join(settings.BASE_DIR, src[1:]).replace("\\", "/")
else:
path = src
image = Image.objects.create(
description=description, name=name, Article=instance,
reduced_information=True, from_article=True,
)
# rest of the function ....
Follow this tutorial for detail information on How to Create Django Signals
Upvotes: 1
Reputation: 1317
SOLUTION
I created a method to check whether the images of the article were already created. That's my new code now:
def save(self, *args, **kwargs):
# If not images added yet, add them
if not self.check_images_already_created(self.html):
self.html = self.create_images_from_tags(self.html)
self.save(*args, **kwargs)
return super().save(*args, **kwargs)
def check_images_already_created(self, html: str) -> bool:
"""
Checks whether the images of the article's HTML were already created.
:param html: HTML of article
:return: Whether the images were already created
"""
# Gets ids of all images
images_ids = set(self.images.filter(from_article=True).values_list("id", flat=True).distinct())
found_ids = set()
if images_ids:
soup = BeautifulSoup(html, "html.parser")
for img in soup.find_all("img"):
if img.has_attr("data-image-id"):
image_id = img["data-image-id"]
if image_id in images_ids:
found_ids.add(image_id)
if images_ids == found_ids:
return True
return False
Maybe I'll help someone with this.
Upvotes: 0