Kwanhee Hwang
Kwanhee Hwang

Reputation: 47

web crawl problem using selenium to click on the link

i want to use selenium to go into the url that i signed and click on the 1st link on the list and get text data.

병역법위반  [대법원 2018. 11. 1., 선고, 2016도10912, 전원합의체 판결]

this is html code for the link on that web page i have tried pretty much every method i can find on online. is it possible that this web page is somehow protected?

from selenium import webdriver
from bs4 import BeautifulSoup
# selenium webdriver chrome


driver = webdriver.Chrome("chromedriver.exe")

# "get url
driver.get("http://law.go.kr/precSc.do?tabMenuId=tab103&query=")


elem = driver.find_elements_by_css_selector("""#viewHeightDiv > table > 
tbody > tr:nth-child(1) > td.s_tit > a""")
if len(elem):
    elem.click()

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
notices = soup.find('div', id='bodyContent')

for n in notices:
    print(n)

so from my code selenium opens up and goes to url and it does not click on what i want to. so the print data i get is not what i was looking for.

i want to know how to web crawl http://law.go.kr/precSc.do?tabMenuId=tab103&query=

maybe there is a way not using selenium? i pick selenium since this webs url is not fixed. last url that is fixed is http://law.go.kr/precSc.do?tabMenuId=tab103&query=

Upvotes: 0

Views: 222

Answers (1)

Sers
Sers

Reputation: 12255

Here code with necessary waits to click on link and get text:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)

driver.get("http://law.go.kr/precSc.do?tabMenuId=tab103&query=")

#Wait for visibility of the first link in viewHeightDiv. Necessary to get text.
elem = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#viewHeightDiv a")))
#Get first word of the link. Will be you used to check if page loaded by checking title of the text.
title = elem.text.strip().split(" ")[0]

elem.click()
#Wait for h2 to have title we get before.
wait.until(EC.text_to_be_present_in_element((By.CSS_SELECTOR, "#viewwrapCenter h2"), title))

content = driver.find_element_by_css_selector("#viewwrapCenter").text
print(content)

Upvotes: 1

Related Questions