CEsmonde
CEsmonde

Reputation: 35

How to remove HTML tags from output text?

Apologies if this question has already been asked before, but all the solutions I have tried did not seem to work.

I have created a program where the user enters a word, and the program pulls an example of that word from the Dictionary.com website.

I want to remove the HTML tags that always surround the keyword. How would I go about doing this?

import requests

word = input("Enter a word: ")

webContent = requests.get('https://www.dictionary.com/browse/'+word)

from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')

results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})

firstResult = results[0]
print(firstResult.contents[0:3])

Result: Result

Upvotes: 2

Views: 249

Answers (2)

Jessica
Jessica

Reputation: 3173

try this: you just need to use the .getText() function

import requests
word = input("Enter a word: ")
webContent = requests.get('https://www.dictionary.com/browse/'+word)
from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')

results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})

result = soup.find('p').getText()
print(result)

Upvotes: 0

nandu kk
nandu kk

Reputation: 368

import requests
import re

word = input("Enter a word: ")

webContent = requests.get('https://www.dictionary.com/browse/'+word)

from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')

results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})

firstResult = results[0]
firstResult.contents=[re.sub('<[^<]+?>', '', str(x)) for x in firstResult.contents]
print(firstResult.contents[0:3])

Result:

enter image description here

Upvotes: 1

Related Questions