How to web scraping
tags inside
tags that has class/id from HTML using Python

Question

I want to extract the data such as

Release date: June 16, 2016 Vulnerability identifier: APSB16-23 Priority: 3 CVE number: CVE-2016-4126

from https://helpx.adobe.com/security/products/air/apsb16-23.ug.html

The code:

import requests
from bs4 import BeautifulSoup as bs
from pprint import pprint
    
r = requests.get('https://helpx.adobe.com/cy_en/security/products/air/apsb16-31.html')
soup = bs(r.content, 'html.parser')
pprint([i.text for i in soup.select('div > .text >  p' , limit = 4 )] )

The output:

['Release date:\xa0September 13, 2016',
 'Vulnerability identifier: APSB16-31',
 'Priority: 3',
 'CVE number:\xa0CVE-2016-6936']

The problem is there is /xa0. How should I remove it? and if there is any others efficient code than this? and I also wanted to convert it into CSV file. Thank you.

baduker · Accepted Answer

\xa0 is actually non-breaking space in Latin1 (ISO 8859-1), also chr(160). You should replace it with a space.

Try this:

import requests
from bs4 import BeautifulSoup as bs
from pprint import pprint

r = requests.get('https://helpx.adobe.com/cy_en/security/products/air/apsb16-31.html')
soup = bs(r.content, 'html.parser')
pprint([i.text.replace(u'\xa0', u' ') for i in soup.select('div > .text >  p', limit=4)])

Output:

['Release date: September 13, 2016',
 'Vulnerability identifier: APSB16-31',
 'Priority: 3',
 'CVE number: CVE-2016-6936']

EDIT: To drop the result to a .csv file use pandas.

Here's how:

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://helpx.adobe.com/cy_en/security/products/air/apsb16-31.html')
soup = bs(r.content, 'html.parser')
release = [
    i.getText().replace(u'\xa0', u' ').split(": ") for i
    in soup.select('div > .text >  p', limit=4)
]
pd.DataFrame(release).set_index(0).T.to_csv("release_data.csv", index=False)

Output:

How to web scraping <p> tags inside <div> tags that has class/id from HTML using Python

Answers (2)

Related Questions

How to web scraping &lt;p&gt; tags inside &lt;div&gt; tags that has class/id from HTML using Python

Answers (2)

Related Questions

How to web scraping <p> tags inside <div> tags that has class/id from HTML using Python