Reputation:
Does anyone know why this code doesn't do the job? It works perfectly when I want to scrape smaller files with data from a certain date e.g only from 2017 but not with this one. Is this file too big or something? There's no error or anything like that. Every time I run this script but with mentioned smaller file It takes about 30 seconds to download everything and save into a database so there are no mistakes in code I think. After running the script I'm just getting "Process finished with exit code 0" and nothing more.
from bs4 import BeautifulSoup
import urllib.request
from app import db
from models import CveData
from sqlalchemy.exc import IntegrityError
url = "https://cve.mitre.org/data/downloads/allitems.xml"
r = urllib.request.urlopen(url)
xml = BeautifulSoup(r, 'xml')
vuln = xml.findAll('Vulnerability')
for element in vuln:
note = element.findAll('Notes')
title = element.find('CVE').text
for element in note:
desc = element.find(Type="Description").text
test_date = element.find(Title="Published")
if test_date is None:
pass
else:
date = test_date.text
data = CveData(title,date,desc)
try:
db.session.add(data)
db.session.commit()
print("adding... " + title)
# don't stop the stream, ignore the duplicates
except IntegrityError:
db.session.rollback()
Upvotes: 0
Views: 156
Reputation: 2093
I downloaded the file that you said didn't work, and the one you said did and ran these two greps with different results:
grep -c "</Vulnerability>" allitems-cvrf-year-2019.xml
21386
grep -c "</Vulnerability>" allitems.xml
0
The program is not stopping on opening the file, it is running to completion. You aren't getting any output because there are no Vulnerability
tags in the xml file. (Now my grep is not technically accurate, as I believe there could be spaces in the Vulnerability closing tag, but I doubt that is the case here.)
Upvotes: 1