Issue with scraping in python

Question

I am trying to scrape some precise lines and create table from collected data (url attached), but cannot get more than the entire body text. Thus, I got stuck.

To give some example:

I would like to arrive at the below table, scraping details from the body content.All the details are there, however any help on how to retrieve them in a form given below would be much appreciated.

My code is:

import requests
from bs4 import BeautifulSoup
# providing url
url = 'https://www.polskawliczbach.pl/wies_Baniocha'

# creating request object
req = requests.get(url)

# creating soup object
data = BeautifulSoup(req.text, 'html')

# finding all li tags in ul and printing the text within it
data1 = data.find('body')
for li in data1.find_all("li"):
   print(li.text, end=" ")

imxitiz · Accepted Answer

At first find the ul and then try to find li inside ul. Scrape needed data, save scraped data in variable and make table using pandas. Now we have done all things if you want to save table then save it in csv file otherwise just print it.

Here's the code implementation of all above things:

from bs4 import BeautifulSoup
import requests
import pandas as pd

page = requests.get('https://www.polskawliczbach.pl/wies_Baniocha')
soup = BeautifulSoup(page.content, 'lxml')

lis=soup.find_all("ul",class_="list-group row")[1].find_all("li")[1:-1]
dic={"name":[],"value":[]}
for li in lis:
    try:
        dic["name"].append(li.find(text=True,recursive=False).strip())
        dic["value"].append(li.find("span").text.replace(" ",""))
        print(li.find(text=True,recursive=False).strip(),li.find("span").text.replace(" ",""))
    except:
        pass

df=pd.DataFrame(dic)

print(df)
# If you want to save this as file then uncomment following line:
# df.to_csv(".csv")

And additionally if you want to scrape all then "categories", I don't understand that language so,I don't know which is useful and which is not but anyway here's the code, you can just change this part of above code:

soup = BeautifulSoup(page.content, 'lxml')

dic={"name":[],"value":[]}
lis=soup.find_all("ul",class_="list-group row")
for li in lis:
    a=li.find_all("li")[1:-1]
    for b in a:
        error=0
        try:
            print(b.find(text=True,recursive=False).strip(),"	",b.find("span").text.replace(" ","").replace(",",""))
            dic["name"].append(b.find(text=True,recursive=False).strip())
            dic["value"].append(b.find("span").text.replace(" ","").replace(",",""))
        except Exception as e:
            pass

df=pd.DataFrame(dic)

Issue with scraping in python

Answers (2)

Related Questions