Reputation: 191
from bs4 import BeautifulSoup
URL = "https://www.worldometers.info/coronavirus/"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
countHTML = soup.find('div', attrs = {'class':'content-inner'})
for countVar in countHTML.findAll('div', attrs = {'class':'maincounter-number'}):
count = countVar.span
Right now variable count
returns:
<span style="color:#aaa">270,069</span>
<span>11,271</span>
<span>90,603</span>
I need help on extracting 3 separate integers from this string, I have tried count[0]
but this is not an array so it does not work.
String1 = "270,069"
String2 = "11,271"
String3 = "90,603"
Then converts into 3 integers by removing the comma
Int1 = 270069
Int2 = 11271
Int3 = 90603
Perhaps Regex will help?
Edit:
I currently have numbers = []
as one value in a list, such as
numbers = """
270069
11271
90603"""
so if I do numbers[0], all 3 integers will show up as 1 value, how do I strip new lines, and make them into a list or array with 3 separate values?
Upvotes: 3
Views: 274
Reputation: 66
You could use the split method as follows
intAsString = '123\n1234\n12345'
listOfInts = intAsString.split('\n')
Here, listOfInts would be ['123', '1234', '12345']
In python, \n is the new line character, so splitting by newline should give you the three numbers
Upvotes: 0
Reputation: 1048
Yep, some simple Regex should work.
s = '''<span style="color:#aaa">270,069</span>
<span>11,271</span>
<span>90,603</span>'''
num_strs = re.findall('[0-9,]+', s)
numbers = [int(ns.replace(',', '')) for ns in num_strs]
# Extract to variables
num1, num2, num3 = numbers
Upvotes: 1
Reputation: 17322
you could usse:
my_numbers = []
for countVar in countHTML.findAll('div', attrs = {'class':'maincounter-number'}):
my_numbers.append(int(countVar.span.text.strip().replace(',', '')))
print(my_numbers)
output:
[270104, 11272, 90603]
Upvotes: 1