Reputation: 19
I need to remove the tags and leave only the text in the below codes output using python and beautifulsoup.
import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content)
print(soup.prettify())
first_header = soup.find(["h2", "h2"])
first_headers = soup.find_all(["h2", "h2"])
first_headers
Upvotes: 0
Views: 428
Reputation: 25048
To get only the text from your ResultSet
iterate over it e.g. with list comprehension
, call .text
for every element and .join()
all text elements by whitespace
:
' '.join([e.text for e in soup.find_all('h2')])
import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content)
first_headers = ' '.join([e.text for e in soup.find_all('h2')])
print(first_headers)
Tutorials References Exercises and Quizzes HTML Tutorial HTML Forms HTML Graphics HTML Media HTML APIs HTML Examples HTML References What is HTML? A Simple HTML Document What is an HTML Element? Web Browsers HTML Page Structure HTML History Report Error Thank You For Helping Us!
Upvotes: 0
Reputation: 24
import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content,features="html.parser") # getting content from webpage
# retriving all h1 and h2 tags and extracting text from each of them
first_headers = [html.text for html in soup.find_all(["h1", "h2"])]
print(first_headers)
I used list comprehension to solve it in a single line you can use a for loop instead which goes as
import requests
from bs4 import BeautifulSoup as bs
r = requests.get("https://www.w3schools.com/html/html_intro.asp")
soup = bs(r.content,features="html.parser")
first_headers = soup.find_all(["h1", "h2"])
for i in first_headers:
print(i.text)
This is the output of my code:
Tutorials
References
Exercises and Quizzes
HTML Tutorial
HTML Forms
HTML Graphics
HTML Media
HTML APIs
HTML Examples
HTML References
HTML Introduction
What is HTML?
A Simple HTML Document
What is an HTML Element?
Web Browsers
HTML Page Structure
HTML History
Report Error
Thank You For Helping Us!
Upvotes: 1