yjseo0227
yjseo0227

Reputation: 1

Scraping multiple pages in Beautiful Soup

I'm trying to scrape all the pages in this website: https://www.edison.k12.nj.us/directory?const_page=1&. I thought that I could go to the next page by replacing the number 1 with 2,3,4 and so on. However, this was not the case because when I checked href attribute of the tags, it doesn't seem to link to a new page. In this case, how can I scrape multiple pages in this case? Thank you so much!

page = 1
df_list = []
df = None
while(page < 240):
    url = 'https://www.edison.k12.nj.us/directory?const_page='+str(page)+'&amp;'
    # gets back the beautiful soup object
    bs = create_beautiful(url)

    #calls the extract_data to get necessary data()
    df2 = extract_data(bs)
    if page == 1:
        df = df2
        
    else:
        df_list.append(df2)
        

    page+=1

count = 1
for df2 in df_list:
    df.append(df2 , ignore_index = True)
    count+=1

to_csv_and_excel(df, 'edison_township_public')

Upvotes: 0

Views: 93

Answers (1)

Dmitriy Zub
Dmitriy Zub

Reputation: 1724

You can see if any requests are being sent from the server or to the server in the dev tools -> network -> Fetch/XHR tab. Try to click on the next page and you'll this link in the headers tab:

https://www.edison.k12.nj.us/fs/elements/59?const_page=1&is_draft=false&is_load_more=true&parent_id=59&_=1629643598511

You can try to do a very basic for in range() loop and replace const_page={VALUE} and parent_id=59&_=162964359851{VALUE} with loop values.

Note: it is slow and needs to be replaced with a faster solution if needed.

for index in range(1, 240):

  params = {
    'const_page': index, 
    'is_draft': 'false',
    'is_load_more': 'true',
    'parent_id': '59',
    '_': f'162964359851{index}' # only LAST number changing on each page. Same as const_page number.
    
  }
  
  html = requests.get(f"https://www.edison.k12.nj.us/fs/elements/59", params=params)
  soup = BeautifulSoup(html.text, 'lxml')

  title = soup.select_one('.fsConstituentProfileLink').text

--------
'''
Donna Abatemarco 
Irina Acha 
Philip Adornato 
Victoria Ajijedidun 
Taylor Aljian 
Kelly Amabile 
Elizabeth Andrade 
Deliane Antonio 
Nicole Aravena 
Pamela Aurilio 
Aimee Baer 
Sharmila Balaji 
Meghan Banach 
... more names
'''

Upvotes: 1

Related Questions