Reputation: 35
Apologies if this question has already been asked before, but all the solutions I have tried did not seem to work.
I have created a program where the user enters a word, and the program pulls an example of that word from the Dictionary.com website.
I want to remove the HTML tags that always surround the keyword. How would I go about doing this?
import requests
word = input("Enter a word: ")
webContent = requests.get('https://www.dictionary.com/browse/'+word)
from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')
results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})
firstResult = results[0]
print(firstResult.contents[0:3])
Result:
Upvotes: 2
Views: 249
Reputation: 3173
try this: you just need to use the .getText() function
import requests
word = input("Enter a word: ")
webContent = requests.get('https://www.dictionary.com/browse/'+word)
from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')
results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})
result = soup.find('p').getText()
print(result)
Upvotes: 0
Reputation: 368
import requests
import re
word = input("Enter a word: ")
webContent = requests.get('https://www.dictionary.com/browse/'+word)
from bs4 import BeautifulSoup
soup = BeautifulSoup(webContent.text, 'html.parser')
results = soup.find_all('p', attrs={'class':'one-click-content css-it69we e15kc6du7'})
firstResult = results[0]
firstResult.contents=[re.sub('<[^<]+?>', '', str(x)) for x in firstResult.contents]
print(firstResult.contents[0:3])
Result:
Upvotes: 1