SittingNap
SittingNap

Reputation: 21

Remove HTML tags and unwanted information in Python With BeautifulSoup

I am pretty new to Python and I am in the process of parsing the contents of a webpage with BeautifulSoup. The webpage is https://www.ranker.com/crowdranked-list/the-greatest-rappers-of-all-time if that matters. I want to make a list of the top 25 rappers. I managed to find the path with the rappers name, but cannot get rid of the HTML tags and other nested information. Is there a way to iterate over the list, to only display the name of the artist?

Here is my code:

r = requests.get('https://www.ranker.com/crowdranked-list/the-greatest-rappers-of-all-time')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('meta', attrs={'itemprop': 'name'})
results 
[<meta content="Ranker" itemprop="name"/>,
 <meta content="Eminem" itemprop="name"/>,
 <meta content="Tupac" itemprop="name"/>,
 <meta content="The Notorious B.I.G." itemprop="name"/>,
 <meta content="Kendrick Lamar" itemprop="name"/>,
 <meta content="Nas" itemprop="name"/>,
 <meta content="Dr. Dre" itemprop="name"/>,
 <meta content="Lil Wayne" itemprop="name"/>,
 <meta content="J. Cole" itemprop="name"/>,
 <meta content="Ice Cube" itemprop="name"/>,
 <meta content="Snoop Dogg" itemprop="name"/>,
 <meta content="Jay-Z" itemprop="name"/>,
 <meta content="Kanye West" itemprop="name"/>,
 <meta content="André 3000" itemprop="name"/>,
 <meta content="50 Cent" itemprop="name"/>,
 <meta content="Eazy-E" itemprop="name"/>,
 <meta content="DMX" itemprop="name"/>,
 <meta content="Drake" itemprop="name"/>,
 <meta content="ASAP Rocky" itemprop="name"/>,
 <meta content="Busta Rhymes" itemprop="name"/>,
 <meta content="Kid Cudi" itemprop="name"/>,
 <meta content="Ghostface Killah" itemprop="name"/>,
 <meta content="Chance the Rapper" itemprop="name"/>,
 <meta content="Childish Gambino" itemprop="name"/>,
 <meta content="Nate Dogg" itemprop="name"/>,
 <meta content="Logic" itemprop="name"/>]

Basically I want to have this output (but for all 25 artists), which works with a single item in the list:

first_result = results[1]

print(first_result['content'])

Eminem

Upvotes: 2

Views: 335

Answers (1)

JayPeerachai
JayPeerachai

Reputation: 3842

loop and use .get('content')

from bs4 import BeautifulSoup
import requests

r = requests.get('https://www.ranker.com/crowdranked-list/the-greatest-rappers-of-all-time')

soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('meta', attrs={'itemprop': 'name'})
# get the names of the rappers
names = [result.get('content') for result in results]
names

output:

['Ranker',
 'Eminem',
 'Tupac',
 'The Notorious B.I.G.',
 'Kendrick Lamar',
 'Nas',
 ...
 'Logic']

Upvotes: 1

Related Questions