Obtaining column from wikipedia table using beautifulsoup

Question

source_code = requests.get('http://en.wikipedia.org/wiki/Taylor_Swift_discography')
soup = BeautifulSoup(source_code.text)
tables = soup.find_all("table")

I'm trying to get a list of song names from the table "List of Singles" at Taylor Swift's discography

The table has no unique class or id. The only unique thing I can think of is the caption tag around "List of singles..."

List of singles as main artist, with selected chart positions, sales figures and certifications

I tried:

table = soup.find_all("caption")

but it returns nothing, i'm assuming that caption is not a recognized tag in bs4?

alecxe · Accepted Answer

It is actually nothing to do with findAll() and find_all(). findAll() was used in BeautifulSoup3 and was left in BeautifulSoup4 for compatibility reasons, quote from the bs4's source code:

def find_all(self, name=None, attrs={}, recursive=True, text=None,
             limit=None, **kwargs):
    generator = self.descendants
    if not recursive:
        generator = self.children
    return self._find_all(name, attrs, text, limit, generator, **kwargs)

findAll = find_all       # BS3

And, there is a nicer way to get the list of singles, relying on the span element with id="Singles" that indicates the start of Singles paragraph. Then, use the find_next_sibling() to get the first table after the span tag's parent. Then, get all th elements with scope="row":

from bs4 import BeautifulSoup
import requests


source_code = requests.get('http://en.wikipedia.org/wiki/Taylor_Swift_discography')
soup = BeautifulSoup(source_code.content)

table = soup.find('span', id='Singles').parent.find_next_sibling('table')
for single in table.find_all('th', scope='row'):
    print(single.text)

Prints:

"Tim McGraw"
"Teardrops on My Guitar"
"Our Song"
"Picture to Burn"
"Should've Said No"
"Change"
"Love Story"
"White Horse"
"You Belong with Me"
"Fifteen"
"Fearless"
"Today Was a Fairytale"
"Mine"
"Back to December"
"Mean"
"The Story of Us"
"Sparks Fly"
"Ours"
"Safe & Sound"
(featuring The Civil Wars)
"Long Live"
(featuring Paula Fernandes)
"Eyes Open"
"We Are Never Ever Getting Back Together"
"Ronan"
"Begin Again"
"I Knew You Were Trouble"
"22"
"Highway Don't Care"
(with Tim McGraw)
"Red"
"Everything Has Changed"
(featuring Ed Sheeran)
"Sweeter Than Fiction"
"The Last Time"
(featuring Gary Lightbody)
"Shake It Off"
"Blank Space"

Obtaining column from wikipedia table using beautifulsoup

Answers (2)

Related Questions