rahlf23
rahlf23

Reputation: 9019

Python: Extract text from website that is not in the raw HTML

I have a situation where I am scraping data from webpages and need to store that data (a bunch of strings) in a txt file. I already have the code written to do this for many websites, however I have a roadblock where BeautifulSoup does not seem to work.

Take this website for example: http://www.vucommodores.com/gametracker/launch/gt_mbasebl.html?event=1530990&school=vand&sport=mbasebl&camefrom=&startschool=&

I want to be able to click on the play-by-play button and then extract the text from the 1st inning, 2nd inning, etc. Is anyone aware of a method to do so, because the text is not available in the raw HTML as has been the case with all of my other examples.

Thanks!

Upvotes: 2

Views: 1010

Answers (2)

pythad
pythad

Reputation: 4267

@Lgiro is right. Is you want to manipulate with page elements, for example switch tabs or click buttons, you need simulate a browser and inject javascript into the window. The best tool for this is Selenium. Here are python-selenium docs.

Upvotes: 2

Lgiro
Lgiro

Reputation: 772

I don't think this is what BeautifulSoup is meant for. You can use Selenium for Python to interact with the page as if from a browser, and simulate the click. Then extract from the html.

Upvotes: 2

Related Questions