Reputation: 35
from bs4 import BeautifulSoup
import requests
import random
id_url = "https://codeforces.com/profile/akash77"
id_headers = {
"User-Agent": 'Mozilla/5.0(Windows NT 6.1Win64x64) AppleWebKit / 537.36(KHTML, like Gecko) Chrome / 87.0 .4280 .141 Safari / 537.36 '}
id_page = requests.get(id_url, headers=id_headers)
id_soup = BeautifulSoup(id_page.content, 'html.parser')
id_soup = id_soup.find('svg')
print(id_soup)
I'm getting None
as the output for this.
If I parse the <div>
element in which this <svg>
tag is contained, the contents of the <div>
element are not getting printed. The find()
works for all HTML tags except the SVG tag.
Upvotes: 2
Views: 2202
Reputation: 425
Late answer, but try
id_soup = id_soup.find('svg:svg')
print(id_soup)
SVG is an XML dialect, so Beautiful Soup will annotate the tag names with the name of the namespace.
Upvotes: 0
Reputation: 1411
If you just want the data it is there in the html, this isn't pretty but it works and much quicker and easier than browser automation:
import requests
import json
url = 'https://codeforces.com/profile/akash77'
resp = requests.get(url)
start = "$('#userActivityGraph').empty().calendar_yearview_blocks("
end = "start_monday: false"
s = resp.text
svg_data = s[s.find(start)+len(start):s.rfind(end)].strip()[:-1].replace('items','"items"').replace('data','"data"').replace('\n','').replace('\t','').replace(' ','') #get the token out the html
broken = svg_data+'}'
json_data = json.loads(broken)
print(json_data)
Upvotes: 1
Reputation: 275
The webpage is rendered dynamically with Javascript, so you will need selenium to get the rendered page.
First, install the libraries
pip install selenium
pip install webdriver-manager
Then, you can use it to access the full page
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)
driver.maximize_window()
driver.get('https://codeforces.com/profile/akash77')
elements = driver.find_elements(By.XPATH, '//*[@id="userActivityGraph"]')
Elements is a selenium WebElement, so we will need to get HTML out of it.
svg = [WebElement.get_attribute('innerHTML') for WebElement in elements]
This gives you svg and all elements inside it.
Sometimes, you need to run a browser in headless mode (without opening a chrome UI), for that you can pass a 'headless' option to the driver.
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('headless')
# then pass options to the driver
driver = webdriver.Chrome(service=s, options=options)
Upvotes: 1
Reputation: 71
svg tag is not included in the source code, it is rendered by Javascript.
Upvotes: 1