gcorne
gcorne

Reputation: 439

Python script run via cron does not execute occassionally

I have a simple python script for fetching tweets and caching them to disk that is configured to run every two minutes via cron.

*/2 * * * * (date ; /usr/bin/python /path/get_tweets.py) >> /path/log/get_tweets.log 2>&1

The script runs successfully most of the time. However, every so often the script doesn't execute. In addition to other logging, I added a simple print statement above the meat of the script and nothing except the output from the initial date command makes it to the log.

#!/usr/bin/python
# Script for Fetching Tweets and then storing them as an HTML snippet for inclusion using SSI

print "Starting get_tweets.py"

import simplejson as json
import urllib2
import httplib
import re
import calendar
import codecs
import os
import rfc822
from datetime import datetime
import time
import sys
import pprint


debug = True 

now = datetime.today()
template = u'<p class="tweet">%s <span class="date">on %s</span></p>'
html_snippet = u''
timelineUrl = 'http://api.twitter.com/1/statuses/user_timeline.json?screen_name=gcorne&count=7'
tweetFilePath = '/path/server-generated-includes/tweets.html'
if(debug): print "[%s] Fetching tweets from %s." % (now, timelineUrl)

def getTweets():
    request = urllib2.Request(timelineUrl)
    opener = urllib2.build_opener()
    try:
        tweets = opener.open(request)
    except:
        print "[%s] HTTP Request %s failed." % (now, timelineUrl)
        exitScript()
    tweets = tweets.read()
    return tweets

def exitScript():
    print "[%s] Script failed." % (now)
    sys.exit(0)


tweets = getTweets()
now = datetime.today()
if(debug): print "[%s] Tweets retrieved." % (now)
tweets = json.loads(tweets)

for tweet in tweets:
    text = tweet['text'] + ' '
    when = tweet['created_at']
    when = re.match(r'(\w+\s){3}', when).group(0).rstrip()
    # print GetRelativeCreatedAt(when)
    # convert links
    text = re.sub(r'(http://.*?)\s', r'<a href="\1">\1</a>', text).rstrip()
    #convert hashtags
    text = re.sub(r'#(\w+)', r'<a href="http://www.twitter.com/search/?q=%23\1">#\1</a>', text)
    # convert @ replies
    text = re.sub(r'@(\w+)', r'@<a href="http://www.twitter.com/\1">\1</a>', text)
    html_snippet += template % (text, when) + "\n"

#print html_snippet

now = datetime.today()
if(debug): print "[%s] Opening file %s." % (now, tweetFilePath)
try:
    file = codecs.open(tweetFilePath, 'w', 'utf_8')
except:
    print "[%s] File %s cound not be opened." % (now, tweetFilePath)
    exitScript()

now = datetime.today()
if(debug): print "[%s] Writing %s to disk." % (now, tweetFilePath)
file.write(html_snippet)

now = datetime.today()
if(debug): print "[%s] Finished writing %s to disk." % (now, tweetFilePath)
file.close()
sys.exit(0)

Any ideas? The system is a VPS running Centos 5.3 with python 2.4.

Update: I have added the entire script to avoid any confusion.

Upvotes: 2

Views: 1132

Answers (2)

flexdream
flexdream

Reputation: 11

I just had a problem with a Python script which sometimes wouldn't run in crontab, but always ran from the command line. Turns out I had to redirect logging to /dev/null. The standard output otherwise seems to get full and the program just stops and the process is killed off. Using /dev/null to dump the output, and everything's fine.

Upvotes: 1

Alex Martelli
Alex Martelli

Reputation: 882023

The most likely explanation is that once in a while the script takes more than two minutes (maybe the system's very busy occasionally, or the script may have to wait for some external site that's occasionally busy, etc) and your cron's a sensible one that skips repeating events that haven't yet terminated. By logging the starting and ending times of your script, you'll be able to double check if that is the case. What you want to do in such circumstances is up to you (I recommend you consider skipping an occasional run to avoid further overloading a very busy system -- your own, or the remote one you're getting data from).

Upvotes: 2

Related Questions