Hawkydoky
Hawkydoky

Reputation: 194

HTML in browser doesn't correspond to scraped data in python

For a project I've to scrap datas from a different website, and I'm having problem with one.

When I look at the source code the things I want are in a table, so it seems to be easy to scrap. But when I run my script that part of the code source doesn't show.

Here is my code. I tried different things. At first there wasn't any headers, then I added some but no difference.

# import libraries
import urllib2
from bs4 import BeautifulSoup
import csv  
import requests

# specify the url 
quote_page = 'http://www.airpl.org/Pollens/pollinariums-sentinelles'

# query the website and return the html to the variable 'page'
response = requests.get(quote_page)  
response.addheaders = [('User-agent', 'Mozilla/5.0')]
print(response.text)

# parse the html using beautiful soap and store in variable `response`
soup = BeautifulSoup(response.text, 'html.parser')  

with open('allergene.txt', 'w') as f:
    f.write(soup.encode('UTF-8', 'ignore'))

What I'm looking for in the website is the things after "Herbacée" whose HTML Look like :

<p class="level1">

      <img src="/static/img/state-0.png" alt="pas d'émission" class="state">

    Herbacee
  </p>

Do you have any idea what's wrong ?

Thanks for your help and happy new year guys :)

Upvotes: 1

Views: 1030

Answers (1)

宏杰李
宏杰李

Reputation: 12158

This page use JavaScript to render the table, the real page contains the table is:

http://www.alertepollens.org/gardens/garden/1/state/

You can find this url in Chrome Dev tools>>>Network.

enter image description here

Upvotes: 1

Related Questions