Reputation: 1660
If you visit http://www.imdb.com/title/tt2375692/episodes?season=1 here, then you will see that season 1,episode 1's publish date is 25 Jan. 2014,
This is the code I am using to scrape.
req = urllib2.Request('http://www.imdb.com/title/tt2375692/episodes?season=1')
self.diziPage = urllib2.urlopen(req).read()
self.diziSoup = BeautifulSoup(self.diziPage,from_encoding="utf8")
After I scrape the site, everything is fine except the airdate, episode 1 's airdate comes out 20 April 2014, which is not in present when I visit, all of the rest informations comes corrent.
I thought it may be because of headers I did some experiments but that didnt work.
Upvotes: 1
Views: 189
Reputation: 1660
Seems like, imdb provides different air dates according to visitors location. This is why I m getting different data, I think they check visitor's ip or something.
Upvotes: 0
Reputation: 474141
I get 25 Jan. 2014
when I scrape the date using BeautifulSoup
. First, find the link to the first episode I.
, then get the episode block by taking parent of the link parent, then find the date by class inside:
import urllib2
from bs4 import BeautifulSoup
url = "http://www.imdb.com/title/tt2375692/episodes?season=1"
soup = BeautifulSoup(urllib2.urlopen(url))
episode1 = soup.find('a', {'title': 'I.'}).parent.parent
print episode1.find('div', {'class': 'airdate'}).text.strip()
prints:
25 Jan. 2014
Upvotes: 2