Reputation: 614

How to access specific table shown in inspect using Python and BeautifulSoup for web scraping

I am working on web scraping using Python and BeautifulSoup. My purpose is to pull members data from https://thehia.org/directory?&tab=1. There are around 1685 records.

When I view the page source on my Chrome, I cannot find the table. Seems it dynamically pulls the data. But when I use the inspect option of Chrome, I can find the "membersTable" table in the div that I need.

How can I use BeautifulSoup to access that membersTable that I can access in the inspect.

Upvotes: 1

Answers (2)

QHarr

Reputation: 84465

You can mimic the POST request the page makes for content then use hjson to handle unquoted keys in string pulled out of response

import requests, hjson
import pandas as pd

data = {'formId': '3721260'}
r = requests.post('https://thehia.org/Sys/MemberDirectory/LoadMembers', data=data)
data = hjson.loads(r.text.replace('while(1); ',''))
total = data['TotalCount']
structure = data['JsonStructure']
members = hjson.loads(structure)
df = pd.DataFrame([[member[k][0]['v'] for k in member.keys()] for member in members['members'][0]]
            ,columns = ['Organisation', 'City', 'State','Country'])
print(df)

Upvotes: 1

Rajith Thennakoon

Reputation: 4130

Try this one

   import requests
   from bs4 import BeautifulSoup


    url = "https://thehia.org/directory?&tab=1"
    response = requests.get(url)
    html = response.content

    soup = BeautifulSoup(html)
    table = soup.find('table', attrs={'class': 'membersTable'})

    row_list = []
    for row in table.findAll('tr',{'class':['normal']}):
        data= []
        for cell in row.findAll('td'):
            data.append(cell.text)
        row_list.append(data)

    print(row_list)

Upvotes: 0

How to access specific table shown in inspect using Python and BeautifulSoup for web scraping

Answers (2)

Related Questions