Reputation: 15
This is the HTML code:
<div xmlns="" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;">42263 - Unencrypted Telnet Server</div>
I am trying to print 42263 - Unencrypted Telnet Server
using Beautiful Soup but the output is an empty element i.e, []
This is my Python code:
from bs4 import BeautifulSoup
import csv
import urllib.request as urllib2
with open(r"C:\Users\sourabhk076\Documents\CBS_1.html") as fp:
soup = BeautifulSoup(fp.read(), 'html.parser')
divs = soup.find_all('div', attrs={'background':'#fdc431'})
print(divs)
Upvotes: 1
Views: 586
Reputation: 3118
Solution with regexes:
from bs4 import BeautifulSoup
import re
with open(r"C:\Users\sourabhk076\Documents\CBS_1.html") as fp:
soup = BeautifulSoup(fp.read(), 'html.parser')
Let's find the div that matches the following regular expression: background:\s*#fdc431;
. \s
matches a single Unicode whitespace character. I assumed that there can be 0 or more whitespaces so I added the *
modifier to match 0 or more repetitions of the preceding RE. You can read more about regexes here as they sometimes come in handy. I also recommend you this online regex tester.
div = soup.find('div', attrs={'style': re.compile(r'background:\s*#fdc431;')})
This however is equivalent to:
div = soup.find('div', style=re.compile(r'background:\s*#fdc431;'))
You can read about that in the official documentation of BeautifulSoup
Worth reading are also the sections about the kinds of filters you can provide to the find
and other similar methods.
You can supply either a string, regular expression, list, True
or a function, as shown by Keyur Potdar in his anwser.
Assuming the div exists we can get its text by:
>>> div.text
'42263 - Unencrypted Telnet Server'
Upvotes: 2
Reputation: 7238
background
is not an attribute of the div
tag. The attributes of the div
tag are:
{'xmlns': '', 'style': 'box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;'}
So, either you'll have to use
soup.find_all('div', attrs={'style': 'box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;'}
or, you can use the lambda
function to check if background: #fdc431
is in the style
attribute value, like this:
soup = BeautifulSoup('<div xmlns="" style="box-sizing: border-box; width: 100%; margin: 0 0 10px 0; padding: 5px 10px; background: #fdc431; font-weight: bold; font-size: 14px; line-height: 20px; color: #fff;">42263 - Unencrypted Telnet Server</div>', 'html.parser')
print(soup.find(lambda t: t.name == 'div' and 'background: #fdc431' in t['style']).text)
# 42263 - Unencrypted Telnet Server
or, you can use RegEx, as shown by Jatimir in his answer.
Upvotes: 2