Reputation: 439
I have a simple python script for fetching tweets and caching them to disk that is configured to run every two minutes via cron.
*/2 * * * * (date ; /usr/bin/python /path/get_tweets.py) >> /path/log/get_tweets.log 2>&1
The script runs successfully most of the time. However, every so often the script doesn't execute. In addition to other logging, I added a simple print statement above the meat of the script and nothing except the output from the initial date command makes it to the log.
#!/usr/bin/python
# Script for Fetching Tweets and then storing them as an HTML snippet for inclusion using SSI
print "Starting get_tweets.py"
import simplejson as json
import urllib2
import httplib
import re
import calendar
import codecs
import os
import rfc822
from datetime import datetime
import time
import sys
import pprint
debug = True
now = datetime.today()
template = u'<p class="tweet">%s <span class="date">on %s</span></p>'
html_snippet = u''
timelineUrl = 'http://api.twitter.com/1/statuses/user_timeline.json?screen_name=gcorne&count=7'
tweetFilePath = '/path/server-generated-includes/tweets.html'
if(debug): print "[%s] Fetching tweets from %s." % (now, timelineUrl)
def getTweets():
request = urllib2.Request(timelineUrl)
opener = urllib2.build_opener()
try:
tweets = opener.open(request)
except:
print "[%s] HTTP Request %s failed." % (now, timelineUrl)
exitScript()
tweets = tweets.read()
return tweets
def exitScript():
print "[%s] Script failed." % (now)
sys.exit(0)
tweets = getTweets()
now = datetime.today()
if(debug): print "[%s] Tweets retrieved." % (now)
tweets = json.loads(tweets)
for tweet in tweets:
text = tweet['text'] + ' '
when = tweet['created_at']
when = re.match(r'(\w+\s){3}', when).group(0).rstrip()
# print GetRelativeCreatedAt(when)
# convert links
text = re.sub(r'(http://.*?)\s', r'<a href="\1">\1</a>', text).rstrip()
#convert hashtags
text = re.sub(r'#(\w+)', r'<a href="http://www.twitter.com/search/?q=%23\1">#\1</a>', text)
# convert @ replies
text = re.sub(r'@(\w+)', r'@<a href="http://www.twitter.com/\1">\1</a>', text)
html_snippet += template % (text, when) + "\n"
#print html_snippet
now = datetime.today()
if(debug): print "[%s] Opening file %s." % (now, tweetFilePath)
try:
file = codecs.open(tweetFilePath, 'w', 'utf_8')
except:
print "[%s] File %s cound not be opened." % (now, tweetFilePath)
exitScript()
now = datetime.today()
if(debug): print "[%s] Writing %s to disk." % (now, tweetFilePath)
file.write(html_snippet)
now = datetime.today()
if(debug): print "[%s] Finished writing %s to disk." % (now, tweetFilePath)
file.close()
sys.exit(0)
Any ideas? The system is a VPS running Centos 5.3 with python 2.4.
Update: I have added the entire script to avoid any confusion.
Upvotes: 2
Views: 1132
Reputation: 11
I just had a problem with a Python script which sometimes wouldn't run in crontab, but always ran from the command line. Turns out I had to redirect logging to /dev/null
. The standard output otherwise seems to get full and the program just stops and the process is killed off. Using /dev/null
to dump the output, and everything's fine.
Upvotes: 1
Reputation: 882023
The most likely explanation is that once in a while the script takes more than two minutes (maybe the system's very busy occasionally, or the script may have to wait for some external site that's occasionally busy, etc) and your cron's a sensible one that skips repeating events that haven't yet terminated. By logging the starting and ending times of your script, you'll be able to double check if that is the case. What you want to do in such circumstances is up to you (I recommend you consider skipping an occasional run to avoid further overloading a very busy system -- your own, or the remote one you're getting data from).
Upvotes: 2