Reputation: 117
I'm new to python and trying to pull the billboard hot 100 list. I know there is a library already, but I'm practicing (and its done differently). My issue is that Billboard's list of songs doesn't match up with the artists because the syntax of selecting the artist changes between an "a" element and a "span" element. How do I include both types of elements which both contain [@class="chart-row__artist"].
Currently I have:
artists = [x.strip() for x in tree.xpath('//a[@class="chart-row__artist"]/text()')]
but this pulls up songs as well with span:
artists = [x.strip() for x in tree.xpath('//span[@class="chart-row__artist"]/text()')]
It alternates on the page. Any suggestions?
Upvotes: 2
Views: 59
Reputation: 774
Is using xpath necessary? I got a list of all artists with bs4 pretty easily.
import requests
from bs4 import BeautifulSoup
response = requests.get('https://www.billboard.com/charts/hot-100')
soup = BeautifulSoup(response.content, 'lxml')
artists = [row.text.strip() for row in soup.select('.chart-row__artist')]
print(artists)
Upvotes: 0
Reputation: 117
I think I got the syntax for XPath right. It seems like the songs are matching appropriately with artists despite the alternating element nodes for artists. I did this:
artists = [x.strip() for x in tree.xpath('//*[@class="chart-row__artist"]/text()')]
The prefix //* chose the whole document then matched against the class name, so this covered both 'a' elements and 'span' elements.
Upvotes: 1