python BeautifulSoup table scraping

Question

my HTML has several tables, the first table is:

and the rest are of the form:

I want to scrape data from the tables. when I use:
from bs4 import BeautifulSoup
from urllib.request import urlopen

url = 'XXX'
soup = BeautifulSoup(urlopen(url).read(), "lxml")
for table in soup.findAll('table'):
    print(table)
it only finds the first table. when I change the search to:
soup.findAll("table", { "class" : "confluenceTable" })
it doesn't find anything. What am I missing?
using python 3.4 on windows with BeautifulSoup 4.5

      
          
             Negev

alecxe · Accepted Answer

I suspect you are trying to scrape an Atlassian Confluence page which is usually quite dynamic and makes use of JavaScript intensively to load the page. If you look into the HTML source you download with urllib you would not find table elements with confluenceTable class.

Instead, you should either look into using Confluence API, or use a browser automation tool like selenium.

python BeautifulSoup table scraping

Answers (1)

Related Questions