mlost
mlost

Reputation: 69

Python Webscrape with Beautiful Soup

I am new to python and working on a webscraper. My issues is that my list is only populating the first link in each category. Length on output is 9, but should be 25. I am pretty sure my error has something to do with my l=[] and d={}, but not sure.

Any help would be appreciated.

import requests
from bs4 import BeautifulSoup
import gspread
import re
#import pandas as pd

url = 'https://www.astro.org/Patient-Care-and-Research/Clinical-Practice-Statements/Clinical-Practice-Guidelines'

r=requests.get(url)
c=r.content

soup=BeautifulSoup(c,'lxml')

all=soup.find_all('div', {'class':'panel-body'})

l=[]
for item in all:
      
    try:
        links=item.find_all('a')
        for a in links:
            d={}
            d['link']=zurl= ("https://www.astro.org" + a['href'])
            r2=requests.get(zurl)
            c2=r2.content
            soup2=BeautifulSoup(c2,'html.parser')
            title=soup2.select('#form > div.wrapper.interior-page > section:nth-child(6) > div > div > div.col-md-8.col-md-offset-1.col-sm-8.col-sm-offset-1.col-xs-12.floatright > div:nth-child(1) > div > h1')
            titlelst = title[:len(title)]
            titleparagraph = []
            for x in titlelst:
                titleparagraph.append(str(x.text))
                d['title']=("".join(map(str,titleparagraph)))
            all3=soup2.select('#form > div.wrapper.interior-page > section:nth-child(6) > div > div > div.col-md-8.col-md-offset-1.col-sm-8.col-sm-offset-1.col-xs-12.floatright > div:nth-child(2) > div')
            lst = all3[:len(all3)]
            paragraphs = []
            for x in lst:
                paragraphs.append(str(x.text))
                d['full']=("".join(map(str,paragraphs)))
                lplinks=x.find_all('a')
                lplinklist = []
                for a in lplinks:
                    lplinklist.append(str(a['href'])+'\n')
                    d['link2']=("".join(map(str,lplinklist)))     
                    
    except:
        print(None)
     
    l.append(d)
    print(len(l))

Upvotes: 0

Views: 42

Answers (1)

Arthur Pereira
Arthur Pereira

Reputation: 1559

You just put the l.append(d) out of the for loop. So you only appending the last d in each a you query. Move it to the end of the loop and it will work fine:

for item in all:

    try:
        links = item.find_all('a')
        for a in links:
            ... 
            ...    

            l.append(d)

    except:
        print(None)

print(len(l)) # prints 25

Upvotes: 1

Related Questions