Reputation: 31
I'm trying to create a program that can "find" specified HTML tags, and "replace" those tags with something else (working with HTML text imported as a string).
Disclaimer: I'm pretty new to python so I might be missing something obvious. Also - based on previous posts topics similar to this, I've surmised that utilizing the Regular Expressions module would likely suit this project the best (I'll take suggestions for alternatives though).
Here's what I have as my "input" text:
<p align="left"><span style="font-family: Arial,Arial; font-size: 12px; color: #ffffff;">Example Company | Technical How-To</span></p>
Here's what I want as my "output" text:
<p>Example Company | Technical How-To</p>
Here's what I get for my "output" text:
</p>
Here's my python code used to get that answer:
while True:
import re
print("Enter HTML Text Below")
original = input("")
def cleaner(raw_html):
cleantextp = re.sub('<p.*?>', '<p>', raw_html)
cleantextspan1 = re.sub('<span.*?>', '', cleantextp)
cleantextspan2 = re.sub('<.*?/span>', '', cleantextspan1)
return cleantextspan2
if len(original) > 0:
print(cleaner(original))
else:
print("Please try again")
Weird thing for me is, when I "separate" my defined function(s) out and let it "clean" one specified tag at a time, it seems to work. Example:
while True:
import re
print("Enter HTML Text Below")
original = input("")
def cleaner(raw_html):
cleantextp = re.sub('<p.*?>', '<p>', raw_html)
return cleantextp
if len(original) > 0:
print(cleaner(original))
else:
print("Please try again")
This code gets me this text (doesn't delete the <span>
tags intentionally, but also doesn't return </p>
again):
<p><span style="font-family: Arial,Arial; font-size: 12px; color: #ffffff;">Example Company | Technical How-To</span></p>
So basically, I'm stuck. I've tried a few different methods, including defining a separate "clean" function for each tag and iterating the "input" text through each function in sequence, but I've not had any luck. Any suggestions?
Upvotes: 2
Views: 235
Reputation: 1772
Use Python's beautfulsoup library. (you need to install it first).
The web is full of examples to find exactly what you need
Upvotes: 1