Reputation: 351
I want to append values within a list where values within the list have been indexed, however I am trying this over a list but it returns only a single value as opposed to all.
For example:
href_url = ['https://www.nhs.uk/Services/Trusts/Overview/DefaultView.aspx?id=103',
'https://www.nhs.uk/Services/Trusts/Overview/DefaultView.aspx?id=827']
storeit = {'phone':[],'hospital':[],'postcode':[],'link':[]}
strip = []
for i in range(0, 2, 1):
r = requests.get(href_url[i])
soup = BeautifulSoup(r.content, 'lxml')
codes = soup.find('div',{'class':'panel-content'}).find_all('p')
if codes!=None:
for h in codes:
strip.append(h.text.strip())
list_data = [l.split(',') for l in strip[0].split('\n') if l]#problem starts here
storeit['phone'].append(list_data[::4])
storeit['hospital'].append(list_data[1::3][0][0)
storeit['postcode'].append(list_data[2::3][0][3])
storeit['link'].append(list_data[3::3])
When I print the elements of list_data
using the indexing notations, I get:
[['01535 652511']]
[['01535 652511']]
Airedale General Hospital
Airedale General Hospital
BD20 6TD
BD20 6TD
[['http://www.airedale-trust.nhs.uk/']]
[['http://www.airedale-trust.nhs.uk/']]
They just repeat the values and I think it's during the of strip[0]
, however if I remove this then I get the error:
'list' object has no attribute 'split'
Because I can only split a string as opposed to a list of strings, how can I overcome this?
Upvotes: 1
Views: 50
Reputation: 195563
You can use next example how to parse information about the hospitals and add them to storeit
dictionary you have prepared:
import requests
from bs4 import BeautifulSoup
href_url = [
"https://www.nhs.uk/Services/Trusts/Overview/DefaultView.aspx?id=103",
"https://www.nhs.uk/Services/Trusts/Overview/DefaultView.aspx?id=827",
]
storeit = {"phone": [], "hospital": [], "postcode": [], "link": []}
for url in href_url:
soup = BeautifulSoup(requests.get(url).content, "html.parser")
storeit["phone"].append(soup.select_one('[property="telephone"]').text)
txt = (
soup.select_one('[typeof="PostalAddress"]')
.get_text(strip=True)
.split(",")
)
storeit["hospital"].append(txt[0])
storeit["postcode"].append(txt[-1])
storeit["link"].append(soup.select_one('[property="url"]')["href"])
print(storeit)
Prints:
{
"phone": ["01535 652511", "0151 228 4811"],
"hospital": ["Airedale General Hospital", "Alder Hey Children's Hospital"],
"postcode": ["BD20 6TD", "L12 2AP"],
"link": ["http://www.airedale-trust.nhs.uk/", "http://www.alderhey.nhs.uk"],
}
Upvotes: 1