Paul Lee
Paul Lee

Reputation: 31

How to use BeautifulSoup for Webscraping

I am trying to scrape all the subject titles of all the forum posts on this website. I am not sure how to go about this as the HTML format of the forum website is not what I am familiar with.

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'http://thailove.net/bbs/board.php?bo_table=ent'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

#I don't think this is correct, but not sure on how else to to do this...
containers = page_soup.findAll("td",{"class":"td_subject"})


for container in containers:
subject = container.a.font.font.contents
#similarly not sure this is correct     
print("subject: ", subject)

Please let me know what I should do. Also keep in mind that the website is in Korean but can be easily translated into English if need be.

Upvotes: 0

Views: 207

Answers (1)

Vinícius Figueiredo
Vinícius Figueiredo

Reputation: 6508

Your code is good until you get to the for loop, you should be acessing container.a.contents[0] to get the subjects, and the print function should be inside your for loop:

for container in containers:
    subject = container.a.contents[0]
    print("subject: ", subject)

Running the script then:

>>>     
subject:  
                    미성년자도 이용하는 게시판이므로 글 수위를 지켜주세요.                    
subject:  
                    방콕의 대표 야시장 - 딸랏롯파이2                    
subject:  
                    공항에서 제일 가까운 레드썬 마사지
.......

Upvotes: 1

Related Questions