Reputation: 31
I am trying to scrape all the subject titles of all the forum posts on this website. I am not sure how to go about this as the HTML format of the forum website is not what I am familiar with.
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'http://thailove.net/bbs/board.php?bo_table=ent'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
#I don't think this is correct, but not sure on how else to to do this...
containers = page_soup.findAll("td",{"class":"td_subject"})
for container in containers:
subject = container.a.font.font.contents
#similarly not sure this is correct
print("subject: ", subject)
Please let me know what I should do. Also keep in mind that the website is in Korean but can be easily translated into English if need be.
Upvotes: 0
Views: 207
Reputation: 6508
Your code is good until you get to the for
loop, you should be acessing container.a.contents[0]
to get the subjects, and the print
function should be inside your for
loop:
for container in containers:
subject = container.a.contents[0]
print("subject: ", subject)
Running the script then:
>>>
subject:
미성년자도 이용하는 게시판이므로 글 수위를 지켜주세요.
subject:
방콕의 대표 야시장 - 딸랏롯파이2
subject:
공항에서 제일 가까운 레드썬 마사지
.......
Upvotes: 1