Jujimufoo
Jujimufoo

Reputation: 35

Web scraping with Python BeautifulSoup returning different HTML than expected

I am trying to do some web scraping on a surf report website using BeautifulSoup, but the returned html does not appear to match the html when viewed in a browser, meaning I can't scrape the data that I am looking for. I am trying to scrape from the following website on the "quiver-surf-height" class, which contains the local surf height estimate. https://www.surfline.com/surf-report/paradise-beach/584204214e65fad6a7709cc1

import requests
from bs4 import BeautifulSoup

url = "https://www.surfline.com/surf-report/paradise-beach/584204214e65fad6a7709cc1"
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")

print(soup.select(".quiver-surf-height"))

The print statement returns an empty list. Reading through the returned html I found a statement "Please turn JavaScript on and reload the page." I'm following the steps laid out in a class, so I'm not sure how to handle this response. Any input is appreciated!

Upvotes: 0

Views: 277

Answers (1)

baduker
baduker

Reputation: 20052

As mentioned in the comments, the data you're after is generated dynamically, however, there's an API you can query to get what you want.

All you need it the surf spot id and how much of days-worth data you want. By default it comes for the last 16 days in an 1-hour intervals. But you can change these params too.

For example, this gets last two days of surf height data served per every hour.

import datetime

import requests

surf_sopt_id = "584204214e65fad6a7709cc1"
days = "2"
api_url = f"https://services.surfline.com/kbyg/spots/forecasts/wave?spotId={surf_sopt_id}&days={days}&intervalHours=1"

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
}

data = requests.get(api_url, headers=headers).json()

for day in data["data"]["wave"]:
    _time = (
        datetime
        .datetime
        .fromtimestamp(day['timestamp'])
        .strftime('%Y-%m-%d %H:%M:%S')
    )
    print(f"{_time}")
    surf = day["surf"]
    print(f"Surf: {surf['min']} - {surf['max']}")
    print(f"{surf['humanRelation']}")

Output:

2022-09-25 06:00:00
Surf: 0.9 - 1.4
Waist to shoulder
2022-09-25 07:00:00
Surf: 0.9 - 1.4
Waist to shoulder
2022-09-25 08:00:00
Surf: 0.9 - 1.4
Waist to shoulder
2022-09-25 09:00:00
Surf: 0.9 - 1.2
Waist to chest
2022-09-25 10:00:00
Surf: 0.9 - 1.2
Waist to chest
2022-09-25 11:00:00
Surf: 0.9 - 1.2
Waist to chest
2022-09-25 12:00:00
Surf: 0.9 - 1.2
Waist to chest
2022-09-25 13:00:00
Surf: 0.9 - 1.2
Waist to chest
2022-09-25 14:00:00
Surf: 0.6 - 1.1
Thigh to stomach
2022-09-25 15:00:00
Surf: 0.6 - 1.1
Thigh to stomach
2022-09-25 16:00:00

and more ...

Upvotes: 1

Related Questions