Kab2k
Kab2k

Reputation: 301

Webscraping data from a table but the tbody tag is missing

I'm trying to webscrape the data table from this site: https://fl511.com/list/events/traffic?start=0&length=25&filters%5B0%5D%5Bi%5D=5&filters%5B0%5D%5Bs%5D=Incidents&order%5Bi%5D=8&order%5Bdir%5D=asc

But unfortunately, when I print out the table it doesn't return the tbody tag (which the information is stored in). All the other tags are shown. Is there a workaround to this?

url = Request(
    url,
    headers={'User-Agent': 'Mozilla/5.0'}
    )
webpage = urlopen(url).read()

table = soup.find_all('table')
print(table)

Upvotes: 0

Views: 146

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195543

The data is loaded from external source via Javascript. You can use this example how to load the data:

import json
import requests

data = {
    "draw": 1,
    "columns": [
        {
            "data": None,
            "name": "",
            "searchable": False,
            "orderable": False,
            "search": {"value": "", "regex": False},
            "title": "",
            "visible": True,
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "region",
            "name": "region",
            "searchable": False,
            "orderable": True,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "county",
            "name": "county",
            "searchable": False,
            "orderable": True,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "roadwayName",
            "name": "roadwayName",
            "searchable": False,
            "orderable": True,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "direction",
            "name": "direction",
            "searchable": False,
            "orderable": True,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "type",
            "name": "type",
            "searchable": False,
            "orderable": True,
            "search": {"value": "Incidents", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "severity",
            "name": "severity",
            "searchable": False,
            "orderable": True,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "description",
            "name": "description",
            "searchable": False,
            "orderable": False,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "startTime",
            "name": "startTime",
            "searchable": False,
            "orderable": True,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": "lastUpdated",
            "name": "lastUpdated",
            "searchable": False,
            "orderable": True,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
        {
            "data": 10,
            "name": "",
            "searchable": False,
            "orderable": False,
            "search": {"value": "", "regex": False},
            "isUtcDate": False,
            "isCollection": False,
        },
    ],
    "order": [{"column": 8, "dir": "asc"}],
    "start": 0,
    "length": 25,
    "search": {"value": "", "regex": False},
}

url = "https://fl511.com/List/GetData/traffic"


data = requests.post(url, json=data).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for i, d in enumerate(data["data"], 1):
    print(i, d["description"])

print()
print("Records total:", data["recordsTotal"])
print("Records filtered:", data["recordsFiltered"])

Prints:

1 Crash in Highlands County on US-27 South, at Lake Josephine Dr. Right lane blocked. Last updated at 04:24 PM.
2 Emergency vehicles in Highlands County on US-27 North, at Lake Josephine Dr. Right lane blocked. Last updated at 04:25 PM.
3 Crash in Manatee County on US-41 North, at Pearl Ave. All lanes blocked. Last updated at 04:29 PM.
4 Crash in Polk County on I-4 East, beyond CR-557. 2 Left lanes blocked. Last updated at 04:32 PM.
5 Emergency vehicles in Manatee County on US-41 South, at Pearl Ave. Left lane blocked. Last updated at 04:35 PM.
6 Crash in Miami-Dade County on I-195 East, beyond North Miami Ave. Right lane blocked. Last updated at 05:03 PM.
7 Crash in Santa Rosa County on I-10 East, ramp to Exit 22 (SR-281/Avalon Blvd). Right shoulder blocked. Last updated at 05:05 PM.
8 Emergency vehicles in Santa Rosa County on I-10 West, at Exit 22 (SR-281/Avalon Blvd). Left shoulder blocked. Last updated at 05:02 PM.
9 Multi-vehicle crash in Duval County on I-295 E South, before Between Atlantic Blvd/St Johns Bluff Rd. Left shoulder blocked. Last updated at 05:30 PM.

Records total: 93
Records filtered: 9

Upvotes: 2

Related Questions