Reputation: 20147
I Want to Monitor Some Content Changes Which is Present in Some Web Pages. i Want to do the Same in Daily Basis using any Scripting or Browser plugin itself....
for example, I Want to Wet Notified if Some Changes Happened in Particular Content at Some Web Pages Based On My Query Without Subscribing their Subscription.
Upvotes: 0
Views: 142
Reputation: 20147
Here is my code, how i scrap a table from one site. in that site, they didn't define id or class in table so you no need to put anything. if id or class there means just use html.xpath('//table[@id=id_val]/tr') instead of html.xpath('//table/tr')
import time
from lxml import etree
import urllib
while True:
time.sleep(60) # for 1 minute time interval
#time.sleep(86400) # for 1 day time interval
web = urllib.urlopen("http://www.yoursite.com/")
html = etree.HTML(web.read())
tr_nodes = html.xpath('//table/tr')
td_content = [tr.xpath('td') for tr in tr_nodes if [td.text for td in tr.xpath('td')][2] == 'Chennai' or [td.text for td in tr.xpath('td')][2] == 'Across India' or 'Chennai' in [td.text for td in tr.xpath('td')][2].split('/') ]
main_list = []
for i in td_content:
if i[5].text == 'Freshers' or 'Freshers' in i[5].text.split('/') or '0' in i[5].text.split(' '):
sub_list = [td.text for td in i]
sub_list.insert(6,'http://yoursite.com/%s'%i[6].xpath('a')[0].get('href'))
main_list.append(sub_list)
print 'main_list',main_list
Upvotes: 2
Reputation: 373
you can do this simply writing the python script based on urllib/requests/Beautiful soup Modules.
What you have to do is write a function to parse the required part of the website and(do the in a loop) check if it meets your requirement, if it doesn't meet then exit the loop and after some time run again the loop (you can do this using time module's time.sleep() function) and check again and again.
def parse(url):
#extract the content you want
while(#condition):
if condition met:
#do this
else:
#do this
time.sleep(#time after that you want to recheck)
that's it and you are done. Don't forget to import modules! :)
Upvotes: 2