Reputation: 69
I am new to python and working on a webscraper. My issues is that my list is only populating the first link in each category. Length on output is 9, but should be 25. I am pretty sure my error has something to do with my l=[] and d={}, but not sure.
Any help would be appreciated.
import requests
from bs4 import BeautifulSoup
import gspread
import re
#import pandas as pd
url = 'https://www.astro.org/Patient-Care-and-Research/Clinical-Practice-Statements/Clinical-Practice-Guidelines'
r=requests.get(url)
c=r.content
soup=BeautifulSoup(c,'lxml')
all=soup.find_all('div', {'class':'panel-body'})
l=[]
for item in all:
try:
links=item.find_all('a')
for a in links:
d={}
d['link']=zurl= ("https://www.astro.org" + a['href'])
r2=requests.get(zurl)
c2=r2.content
soup2=BeautifulSoup(c2,'html.parser')
title=soup2.select('#form > div.wrapper.interior-page > section:nth-child(6) > div > div > div.col-md-8.col-md-offset-1.col-sm-8.col-sm-offset-1.col-xs-12.floatright > div:nth-child(1) > div > h1')
titlelst = title[:len(title)]
titleparagraph = []
for x in titlelst:
titleparagraph.append(str(x.text))
d['title']=("".join(map(str,titleparagraph)))
all3=soup2.select('#form > div.wrapper.interior-page > section:nth-child(6) > div > div > div.col-md-8.col-md-offset-1.col-sm-8.col-sm-offset-1.col-xs-12.floatright > div:nth-child(2) > div')
lst = all3[:len(all3)]
paragraphs = []
for x in lst:
paragraphs.append(str(x.text))
d['full']=("".join(map(str,paragraphs)))
lplinks=x.find_all('a')
lplinklist = []
for a in lplinks:
lplinklist.append(str(a['href'])+'\n')
d['link2']=("".join(map(str,lplinklist)))
except:
print(None)
l.append(d)
print(len(l))
Upvotes: 0
Views: 42
Reputation: 1559
You just put the l.append(d)
out of the for loop. So you only appending the last d
in each a
you query. Move it to the end of the loop and it will work fine:
for item in all:
try:
links = item.find_all('a')
for a in links:
...
...
l.append(d)
except:
print(None)
print(len(l)) # prints 25
Upvotes: 1