Scraping data from multiple previous dates in WSJ stocks website

Question

I am scraping data from WSJ Biggest Gainers website. I am new to Python, so I'm sure this is simple. I just can't find a clear answer to this.

My code currently only downloads the data from one page, but I want it to go back to the previous days of data, for example, and find_all or select the data from the charts. How can I modify the URL in the code to do this? I am using Python 3.4.3 and bs4.

The nice thing is that the previous days website URLs only differ in a few numbers.

For example, This is last Friday http://online.wsj.com/mdc/public/page/2_3021-gainnnm-gainer-20150731.html?mod=mdc_pastcalendar

This is last Thursday

http://online.wsj.com/mdc/public/page/2_3021-gainnnm-gainer-20150730.html?mod=mdc_pastcalendar

Ideally I would like to be able to change the month, date, or year if I wish, and then loop the different page URLs to retrieve the data I wish.

Here is my code:

import requests 
from bs4 import BeautifulSoup


url = 'http://online.wsj.com/mdc/public/page/2_3021-gainnyse-gainer.html'

r = requests.get(url)           #downloads website html

soup = BeautifulSoup(r.content)         #soup calls the data

v_data = soup.select('.text') 

for symbol in v_data:
    print(symbol.text)

I just want to loop this function for the past X days. I have tried making a list of URLs to run with no luck. It is also more work to make a list of URLs, so if I could use something like %s or %d for month, year, and date, then that would be better.

Scraping data from multiple previous dates in WSJ stocks website

Answers (1)

Related Questions