user3288051
user3288051

Reputation: 614

How to access specific table shown in inspect using Python and BeautifulSoup for web scraping

I am working on web scraping using Python and BeautifulSoup. My purpose is to pull members data from https://thehia.org/directory?&tab=1. There are around 1685 records.

When I view the page source on my Chrome, I cannot find the table. Seems it dynamically pulls the data. But when I use the inspect option of Chrome, I can find the "membersTable" table in the div that I need.

enter image description here

How can I use BeautifulSoup to access that membersTable that I can access in the inspect.

Upvotes: 1

Views: 1040

Answers (2)

QHarr
QHarr

Reputation: 84465

You can mimic the POST request the page makes for content then use hjson to handle unquoted keys in string pulled out of response

import requests, hjson
import pandas as pd

data = {'formId': '3721260'}
r = requests.post('https://thehia.org/Sys/MemberDirectory/LoadMembers', data=data)
data = hjson.loads(r.text.replace('while(1); ',''))
total = data['TotalCount']
structure = data['JsonStructure']
members = hjson.loads(structure)
df = pd.DataFrame([[member[k][0]['v'] for k in member.keys()] for member in members['members'][0]]
            ,columns = ['Organisation', 'City', 'State','Country'])
print(df)

enter image description here

Upvotes: 1

Rajith Thennakoon
Rajith Thennakoon

Reputation: 4130

Try this one

   import requests
   from bs4 import BeautifulSoup


    url = "https://thehia.org/directory?&tab=1"
    response = requests.get(url)
    html = response.content

    soup = BeautifulSoup(html)
    table = soup.find('table', attrs={'class': 'membersTable'})

    row_list = []
    for row in table.findAll('tr',{'class':['normal']}):
        data= []
        for cell in row.findAll('td'):
            data.append(cell.text)
        row_list.append(data)

    print(row_list)

Upvotes: 0

Related Questions