Reputation: 47
i want to use selenium to go into the url that i signed and click on the 1st link on the list and get text data.
병역법위반 [대법원 2018. 11. 1., 선고, 2016도10912, 전원합의체 판결]
this is html code for the link on that web page i have tried pretty much every method i can find on online. is it possible that this web page is somehow protected?
from selenium import webdriver
from bs4 import BeautifulSoup
# selenium webdriver chrome
driver = webdriver.Chrome("chromedriver.exe")
# "get url
driver.get("http://law.go.kr/precSc.do?tabMenuId=tab103&query=")
elem = driver.find_elements_by_css_selector("""#viewHeightDiv > table >
tbody > tr:nth-child(1) > td.s_tit > a""")
if len(elem):
elem.click()
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
notices = soup.find('div', id='bodyContent')
for n in notices:
print(n)
so from my code selenium opens up and goes to url and it does not click on what i want to. so the print data i get is not what i was looking for.
i want to know how to web crawl http://law.go.kr/precSc.do?tabMenuId=tab103&query=
maybe there is a way not using selenium? i pick selenium since this webs url is not fixed. last url that is fixed is http://law.go.kr/precSc.do?tabMenuId=tab103&query=
Upvotes: 0
Views: 222
Reputation: 12255
Here code with necessary waits to click on link and get text:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get("http://law.go.kr/precSc.do?tabMenuId=tab103&query=")
#Wait for visibility of the first link in viewHeightDiv. Necessary to get text.
elem = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#viewHeightDiv a")))
#Get first word of the link. Will be you used to check if page loaded by checking title of the text.
title = elem.text.strip().split(" ")[0]
elem.click()
#Wait for h2 to have title we get before.
wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, "#viewwrapCenter h2"), title))
content = driver.find_element_by_css_selector("#viewwrapCenter").text
print(content)
Upvotes: 1