Reputation: 85
I try scraping site in ajax page. I'm just learning python. Sorry if that is an easy question.
Using selenium to load a page and download a piece of code in html. They work perfectly as I want. But I have a problem how to parse these data.
I would like the data to look like this (It may be writing this data to a variable because then I want to transfer it to the mysql database.):
Custom ID:
Name:
Ticket NO:
Rate:
Win:
Data location in html code::
<li class="message">
<div customid="CUSTOM ID">
<span class="name nc-mark-user">NAME</span>
<p>
<span><img src="https://cht.sts.pl/assets/img/accepted.svg" width="15" height="15"> <span class="nc-ticket" onclick="serchTicketHandler('TICKET NO')">RATE / WIN zł</span></span>
</p>
</div>
</li>
My code in python:
import time
from selenium import webdriver
from bs4 import BeautifulSoup
from xml.dom import minidom
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(
("C:/Users/backu/Downloads/chromedriver_win32/chromedriver.exe"),
chrome_options=options)
browser.get("https://www.sts.pl/pl/oferta/zaklady-live/")
time.sleep(1)
element = browser.find_element_by_class_name("nc-message-holder")
source = element.get_attribute('innerHTML')
print(source)
browser.close()
I don't know how to read this code now to extract the data I want.
Thank you so much for all the answers.
Upvotes: 0
Views: 131
Reputation: 1710
from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(
("C:/Users/backu/Downloads/chromedriver_win32/chromedriver.exe"),
chrome_options=options)
browser.get("https://www.sts.pl/pl/oferta/zaklady-live/")
sleep(1)
source = browser.page_source # Get the entire page source from the browser
if browser is not None :browser.close() # No need for the browser so close it
soup = BeautifulSoup(source,'html.parser')
try:
Tags = soup.select('ul.nc-message-holder li.message') # get the elements using css selectors
for tag in Tags: # loop through them
customerId = tag.find('div').get('customid')
name = tag.find('div').find('span').text
#<span class="nc-ticket" onclick="serchTicketHandler('223461999015343335')">8.00 / 51.04 zł</span>
ticketTag = tag.select('span.nc-ticket')
if ticketTag :
ticketNum = ticketTag[0].get('onclick').replace("serchTicketHandler('","").replace("')","")
rate_Win = ticketTag[0].text
if '/' in rate_Win:
rate_Win = rate_Win.split('/')
rate = rate_Win[0].strip()
win = rate_Win[1].strip()
else:
rate = rate_Win
win = ''
print('\n\ncustomerId ==>',customerId)
print('name ==>',name)
print('ticketNum ==>',ticketNum)
print('rate ==>',rate)
print('win ==>',win)
except Exception as e:
print(e)
Output:
customerId ==> c46654fa66765ae11bb34d7d99cf0a77
name ==> Wojciech W
ticketNum ==> 223461999016744267
rate ==> 100.00
win ==> 1340.24 zł
customerId ==> 7b071de240b730ad42cee50711dd8c72
name ==> Grzegorz P
ticketNum ==> 223461988025841282
rate ==> 15.94
win ==> 46.28 zł
customerId ==> 244950ab8485b7180c177a2b7b19b0ae
name ==> Michał J
ticketNum ==> 313441988030838257
rate ==> 12.00
win ==> 73967.98 zł
customerId ==> 9223e1c2f87afb02e6c704acb53308da
name ==> Piotr G
ticketNum ==> 313431999017162038
rate ==> 2.00
win ==> 430.40 zł
customerId ==> 4a8e2695fe71a084f69167ac987c7013
name ==> Dawid B
ticketNum ==> 313461988013246357
rate ==> 10.00
win ==> 1569.30 zł
customerId ==> 6b882a5ef93e0c3e52b81bbee0ba52af
name ==> Adrian P
ticketNum ==> 313441988034262951
rate ==> 2.00
win ==> 451268.63 zł
customerId ==> abd34ea0c7a9b0e07a53a78324cb7e0a
name ==> Michał D
ticketNum ==> 223461999013746135
rate ==> 10.00
win ==> 27.72 zł
customerId ==> bed4fc0ea1f21a7a9b1c6762d2302d09
name ==> Rafał Ż
ticketNum ==> 223461988021146803
rate ==> 607.40
win ==> 2150.26 zł
Upvotes: 1