shartshooter
shartshooter

Reputation: 1811

Python string to date, date to string

I have a list of blog posts with two columns. The date they were created and the unique ID of the person creating them.

I want to return the date of the most recent blog post for each unique ID. Simple, but all of the date values are stored in strings. And all of the strings don't have a leading 0 if the month is less than 10.

I've been struggling w/ strftime and strptime but can't get it to return effectively.

import csv

Posters = {}
with open('datetouched.csv','rU') as f:
    reader = csv.reader(f)

    for i in reader:
        UID = i[0]
        Date = i[1]
        if UID in Posters:
            Posters[UID].append(Date)
        else:
            Posters[UID] = [Date]

    for i in Posters:
        print i, max(Posters[i]), Posters[i]

This returns the following output

0014000000s5NoEAAU 7/1/10 ['1/6/14', '7/1/10', '1/18/14', '1/24/14', '7/1/10', '2/5/14']
0014000000s5XtPAAU 2/3/14 ['1/4/14', '1/10/14', '1/16/14', '1/22/14', '1/28/14', '2/3/14']
0014000000vHZp7AAG 2/1/14 ['1/2/14', '1/8/14', '1/14/14', '1/20/14', '1/26/14', '2/1/14']
0014000000wnPK6AAM 2/2/14 ['1/3/14', '1/9/14', '1/15/14', '1/21/14', '1/27/14', '2/2/14']
0014000000d5YWeAAM 2/4/14 ['1/5/14', '1/11/14', '1/17/14', '1/23/14', '1/29/14', '2/4/14']
0014000000s5VGWAA2 7/1/10 ['7/1/10', '1/7/14', '1/13/14', '1/19/14', '7/1/10', '1/31/14']

It's returning 7/1/2010 because that # is larger than 1. I need the max value of the list returned as the exact same string value.

Upvotes: 0

Views: 124

Answers (2)

Jon Clements
Jon Clements

Reputation: 142156

I'd convert the date to a datetime when loading, and store the results in a defaultdict, eg:

import csv
from collections import defaultdict
from datetime import datetime

posters = defaultdict(list)
with open('datetouched.csv','rU') as fin:
    csvin = csv.reader(fin)
    items = ((row[0], datetime.strptime(row[1], '%m/%d/%y')) for row in csvin)
    for uid, dt in items:
        posters[uid].append(dt)

for uid, dates in posters.iteritems():
    # print uid, list of datetime objects, and max date in same format as input
    print uid, dates, '{0.month}/{0.day}/%y'.format(max(dates))

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1122002

Parse the dates with datetime.datetime.strptime(), either when loading the CSV or as a key function to max().

While loading:

from datetime import datetime

Date = datetime.strptime(i[1], '%m/%d/%y')

or when using max():

print i, max(Posters[i], key=lambda d: datetime.strptime(d, '%m/%d/%y')), Posters[i]

Demo of the latter:

>>> from datetime import datetime
>>> dates = ['1/6/14', '7/1/10', '1/18/14', '1/24/14', '7/1/10', '2/5/14']
>>> max(dates, key=lambda d: datetime.strptime(d, '%m/%d/%y'))
'2/5/14'

Your code can be optimised a little:

import csv

posters = {}
with open('datetouched.csv','rb') as f:
    reader = csv.reader(f)
    for row in reader:
        uid, date = row[:2]
        posters.setdefault(uid, []).append(datetime.strptime(date, '%d/%m/%y'))

for uid, dates in enumerate(posters.iteritems()):
    print i, max(dates), dates

The dict.setdefault() method sets a default value (an empty list here) whenever the key is not present yet.

Upvotes: 2

Related Questions