munchies
munchies

Reputation: 21

How can I change the code to make it such that the html tags do not appear

from bs4 import BeautifulSoup
import requests

url = 'https://www.mediacorp.sg/en/your-mediacorp/our-artistes/tca/male-artistes/ayden-sng-12357686'

artiste_name = 'celeb-name'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

txt = soup.find_all('h1', attrs={'class':artiste_name})

print(txt)

with the above code, i get the output:

[<*h1 class="celeb-name">Ayden Sng</h1*>] #asterisks added to show h1 tags

What do i need to change in my code or how can i make it such that i only get 'Ayden Sng' as my output?

Upvotes: 0

Views: 45

Answers (2)

Mohammed
Mohammed

Reputation: 152

from bs4 import BeautifulSoup 
import requests

url = 'https://www.mediacorp.sg/en/your-mediacorp/our-artistes/tca/male-artistes/ayden-sng-12357686'

artiste_name = 'celeb-name'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

txt = soup.find_all('h1', attrs={'class':artiste_name})

print(txt[0].text)

if there are more than one reuslt you can use this code:

from bs4 import BeautifulSoup 
import requests

url = 'https://www.mediacorp.sg/en/your-mediacorp/our-artistes/tca/male-artistes/ayden-sng-12357686'

artiste_name = 'celeb-name'

page = requests.get(url)

soup = BeautifulSoup(page.text, 'lxml')

txt = soup.find_all('h1', attrs={'class':artiste_name})
for i in txt:
  print(i.text)

Upvotes: 0

esqew
esqew

Reputation: 44698

Iterate over each entry of the txt list and extract its txt property:

txt = [element.text for element in txt] # ['Ayden Sng']

Repl.it

Upvotes: 1

Related Questions