Reputation: 1811
I have a list of blog posts with two columns. The date they were created and the unique ID of the person creating them.
I want to return the date of the most recent blog post for each unique ID. Simple, but all of the date values are stored in strings. And all of the strings don't have a leading 0 if the month is less than 10.
I've been struggling w/ strftime and strptime but can't get it to return effectively.
import csv
Posters = {}
with open('datetouched.csv','rU') as f:
reader = csv.reader(f)
for i in reader:
UID = i[0]
Date = i[1]
if UID in Posters:
Posters[UID].append(Date)
else:
Posters[UID] = [Date]
for i in Posters:
print i, max(Posters[i]), Posters[i]
This returns the following output
0014000000s5NoEAAU 7/1/10 ['1/6/14', '7/1/10', '1/18/14', '1/24/14', '7/1/10', '2/5/14']
0014000000s5XtPAAU 2/3/14 ['1/4/14', '1/10/14', '1/16/14', '1/22/14', '1/28/14', '2/3/14']
0014000000vHZp7AAG 2/1/14 ['1/2/14', '1/8/14', '1/14/14', '1/20/14', '1/26/14', '2/1/14']
0014000000wnPK6AAM 2/2/14 ['1/3/14', '1/9/14', '1/15/14', '1/21/14', '1/27/14', '2/2/14']
0014000000d5YWeAAM 2/4/14 ['1/5/14', '1/11/14', '1/17/14', '1/23/14', '1/29/14', '2/4/14']
0014000000s5VGWAA2 7/1/10 ['7/1/10', '1/7/14', '1/13/14', '1/19/14', '7/1/10', '1/31/14']
It's returning 7/1/2010 because that # is larger than 1. I need the max value of the list returned as the exact same string value.
Upvotes: 0
Views: 124
Reputation: 142156
I'd convert the date to a datetime when loading, and store the results in a defaultdict
, eg:
import csv
from collections import defaultdict
from datetime import datetime
posters = defaultdict(list)
with open('datetouched.csv','rU') as fin:
csvin = csv.reader(fin)
items = ((row[0], datetime.strptime(row[1], '%m/%d/%y')) for row in csvin)
for uid, dt in items:
posters[uid].append(dt)
for uid, dates in posters.iteritems():
# print uid, list of datetime objects, and max date in same format as input
print uid, dates, '{0.month}/{0.day}/%y'.format(max(dates))
Upvotes: 2
Reputation: 1122002
Parse the dates with datetime.datetime.strptime()
, either when loading the CSV or as a key
function to max()
.
While loading:
from datetime import datetime
Date = datetime.strptime(i[1], '%m/%d/%y')
or when using max()
:
print i, max(Posters[i], key=lambda d: datetime.strptime(d, '%m/%d/%y')), Posters[i]
Demo of the latter:
>>> from datetime import datetime
>>> dates = ['1/6/14', '7/1/10', '1/18/14', '1/24/14', '7/1/10', '2/5/14']
>>> max(dates, key=lambda d: datetime.strptime(d, '%m/%d/%y'))
'2/5/14'
Your code can be optimised a little:
import csv
posters = {}
with open('datetouched.csv','rb') as f:
reader = csv.reader(f)
for row in reader:
uid, date = row[:2]
posters.setdefault(uid, []).append(datetime.strptime(date, '%d/%m/%y'))
for uid, dates in enumerate(posters.iteritems()):
print i, max(dates), dates
The dict.setdefault()
method sets a default value (an empty list here) whenever the key is not present yet.
Upvotes: 2