Reputation: 21
I am pretty new to Python and I am in the process of parsing the contents of a webpage with BeautifulSoup. The webpage is https://www.ranker.com/crowdranked-list/the-greatest-rappers-of-all-time if that matters. I want to make a list of the top 25 rappers. I managed to find the path with the rappers name, but cannot get rid of the HTML tags and other nested information. Is there a way to iterate over the list, to only display the name of the artist?
Here is my code:
r = requests.get('https://www.ranker.com/crowdranked-list/the-greatest-rappers-of-all-time')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('meta', attrs={'itemprop': 'name'})
results
[<meta content="Ranker" itemprop="name"/>,
<meta content="Eminem" itemprop="name"/>,
<meta content="Tupac" itemprop="name"/>,
<meta content="The Notorious B.I.G." itemprop="name"/>,
<meta content="Kendrick Lamar" itemprop="name"/>,
<meta content="Nas" itemprop="name"/>,
<meta content="Dr. Dre" itemprop="name"/>,
<meta content="Lil Wayne" itemprop="name"/>,
<meta content="J. Cole" itemprop="name"/>,
<meta content="Ice Cube" itemprop="name"/>,
<meta content="Snoop Dogg" itemprop="name"/>,
<meta content="Jay-Z" itemprop="name"/>,
<meta content="Kanye West" itemprop="name"/>,
<meta content="André 3000" itemprop="name"/>,
<meta content="50 Cent" itemprop="name"/>,
<meta content="Eazy-E" itemprop="name"/>,
<meta content="DMX" itemprop="name"/>,
<meta content="Drake" itemprop="name"/>,
<meta content="ASAP Rocky" itemprop="name"/>,
<meta content="Busta Rhymes" itemprop="name"/>,
<meta content="Kid Cudi" itemprop="name"/>,
<meta content="Ghostface Killah" itemprop="name"/>,
<meta content="Chance the Rapper" itemprop="name"/>,
<meta content="Childish Gambino" itemprop="name"/>,
<meta content="Nate Dogg" itemprop="name"/>,
<meta content="Logic" itemprop="name"/>]
Basically I want to have this output (but for all 25 artists), which works with a single item in the list:
first_result = results[1]
print(first_result['content'])
Eminem
Upvotes: 2
Views: 335
Reputation: 3842
loop and use .get('content')
from bs4 import BeautifulSoup
import requests
r = requests.get('https://www.ranker.com/crowdranked-list/the-greatest-rappers-of-all-time')
soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find_all('meta', attrs={'itemprop': 'name'})
# get the names of the rappers
names = [result.get('content') for result in results]
names
output:
['Ranker', 'Eminem', 'Tupac', 'The Notorious B.I.G.', 'Kendrick Lamar', 'Nas', ... 'Logic']
Upvotes: 1