Reputation: 1028
The objective of my cron job is to save tweets with their timestamps into Google App Engine's datastore. I haven't been able to figure out how to save the data in timestamp form (it is currently saved as a string). Ideally I'd like to save this as a DateTimeProperty to have an easier time of sorting entries down the road. There are particular two problems that I'm struggling with:
The field is formated in the json like this:
s = "Wed, 20 Mar 2013 05:39:25 +0000"
I tried to use the datetime module to parse this string:
timestr = datetime.datetime.strptime(s, "%a, %b %Y %d %H:%M:%S +0000")
when = datetime.fromtimestamp(time.mktime(timestr))
To sum everything up, this is a snippet of my cron.py file:
result = simplejson.load(urllib.urlopen(twitterurl))
for item in result['results']:
g = ""
try:
g = simplejson.dumps(item['geo']['coordinates'])
except:
pass
timestr = datetime.datetime.strptime(str(item['created_at']), "%a, %b %Y %d %H:%M:%S +0000")
when = datetime.fromtimestamp(time.mktime(timestr))
tStore = TweetsFromJSON(user_id=str(item['from_user_id']),
user=item['from_user'],
tweet=unicodedata.normalize('NFKD', item['text']).encode('ascii', 'ignore'),
timestamp=when,
iso=item['iso_language_code'],
geo=g
)
The model for the datastore would be:
class TweetsFromJSON(db.Model):
user = db.TextProperty()
user_id = db.TextProperty()
tweet = db.TextProperty()
timestamp = db.DateTimeProperty()
iso = db.StringProperty()
geo = db.StringProperty()
Upvotes: 1
Views: 833
Reputation: 43495
You should use the following format to scan the time string with datetime.strptime
:
"%a, %d %b %Y %H:%M:%S %z"
This works properly in Python 3:
Python 3.3.0 (default, Mar 22 2013, 20:14:41)
[GCC 4.2.1 Compatible FreeBSD Clang 3.1 ((branches/release_31 156863))] on freebsd9
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> s = 'Wed, 20 Mar 2013 05:39:25 +0000'
>>> datetime.strptime(s, "%a, %d %b %Y %H:%M:%S %z")
datetime.datetime(2013, 3, 20, 5, 39, 25, tzinfo=datetime.timezone.utc)
Notice that this returns a datetime
object, so further manipulation is unnecessary.
Unfortunately this doesn't work in Python 2;
Python 2.7.3 (default, Jan 17 2013, 21:23:30)
[GCC 4.2.1 Compatible FreeBSD Clang 3.0 (branches/release_30 142614)] on freebsd9
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> s = 'Wed, 20 Mar 2013 05:39:25 +0000'
>>> datetime.strptime(s, "%a, %d %b %Y %H:%M:%S %z")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/_strptime.py", line 317, in _strptime
(bad_directive, format))
ValueError: 'z' is a bad directive in format '%a, %d %b %Y %H:%M:%S %z'
This seems to be a bug in Python 2.7. The documentation mentions %z
, but the code in /usr/local/lib/python2.7/_strptime.py
doesn't contain the proper regular expression to match it.
As a workaround on Python 2, you can try this:
>>> datetime.strptime(s[:-6], "%a, %d %b %Y %H:%M:%S")
datetime.datetime(2013, 3, 20, 5, 39, 25)
This just cuts off the last 6 characters. This will only work correctly if the timezone offset has a sign and four digits. Another alterantive would be to use split
and join
:
>>> datetime.strptime(' '.join(s.split()[:-1]), "%a, %d %b %Y %H:%M:%S")
datetime.datetime(2013, 3, 20, 5, 39, 25)
From what I understand you would have to scan the timezone info yourself, create a custom tzinfo
subclass (use the FixedOffset
class example in the linked docs) and use datetime.replace()
to put that in the datetime
object.
Upvotes: 1