Reputation: 2093
I have a text file that have about 10 columns, 7 of those are date/time stamps (year, month, day, hour, minute, second, centisec). So the data looks roughly like...
User[TAB]System[TAB]Year[TAB]Month[TAB]Day ... centisec[TAB]Message
Sorry for the horrible formatting, but I hope this gives you the idea.
So if I wanted to sort the file by years, I could use this
sorted_lines = sorted(unsortedfile,key=lambda l: int(l.split('\t')[2]))
Take the unsorted file, split the lines by tabs, 3rd column, change it to an int and sort by that. I could do the same thing for any one column.
What I'm looking for is a better way to sort this by all the date/time columns. So sort by Year, then by month, then by day... etc
I can think of a few complicated ways of doing this (reading each line combining all the columns, sorting it... or doing a recursive sort by each column), but I'm hoping someone has a simpler, more pythonic, way to do the same thing.
Upvotes: 3
Views: 2026
Reputation: 473863
You can use csv module to parse the file with delimiter='\t'
and apply sorted() on the reader
object with a custom key
function that parses the date to the datetime object:
import csv
from datetime import datetime
from pprint import pprint
def sort_by_datetime(line):
return datetime.strptime('{0}-{1}-{2}'.format(*line[2:5]), '%Y-%m-%d')
with open('input.txt') as f:
reader = csv.reader(f, delimiter='\t')
pprint(sorted(reader, key=sort_by_datetime))
For the input.txt
:
User1 System1 2013 1 31
User2 System2 2014 12 1
User3 System3 2012 12 31
User4 System4 2012 6 15
User5 System5 2014 1 1
it would print:
[['User4', 'System4', '2012', '6', '15'],
['User3', 'System3', '2012', '12', '31'],
['User1', 'System1', '2013', '1', '31'],
['User5', 'System5', '2014', '1', '1'],
['User2', 'System2', '2014', '12', '1']]
Upvotes: 3