A.Dumas
A.Dumas

Reputation: 3267

How extract text html tag keeping its order

I want to process text with html tags in a string

Consider the string

str = "before <b>This text is bold</b> after. <i>italic</i>"

To give more context I use a PIL ImageDraw object to write a wrapped text with a specified width. Part of the code looks as follows

  rect = Rectangle(x,y,width,height)
  curx = rect.x
  cury = rect.y
  for word in allWords:
    wordWidth, wordHight = font.getsize(word + " ")
    if (curx + wordWidth > rect.x + rect.width):
      cury += line_height
      curx = rect.x
    draw.text((curx, cury), word, ImageColor.getcolor(hex, "RGB"), font=font)
    curx += wordWidth

Surely the string str can vary. Moreover using beatifulsoups previousSibling and afterSibling is difficult since the string can be vary.

how would I handle this to use a the proper font with the right text style?

Upvotes: 1

Views: 140

Answers (1)

folen gateis
folen gateis

Reputation: 2012

use beautifulsoup children

from bs4 import BeautifulSoup
data="before <b>This text is bold</b> after. <i>italic</i>"
soup=BeautifulSoup(data, 'lxml')
for child in soup.p.children:
    print(child)
>>> before 
>>> <b>This text is bold</b>
>>>  after. 
>>> <i>italic</i>

Upvotes: 2

Related Questions