Reputation: 155
I am trying to read in a csv file which is a result from running Autoruns autorunsc command line (i.e., autorunsc -a -m -c > mycsv.csv) and sort it by time in a way which I get the newest date/time at the top like it does in excel.
I have so far figured out how to convert it the csv file without an encoding error by reading it in as UTF16 using Recoder python class to reencode it as utf 8.
import sys
import csv
import operator
import codecs
class Recoder(object):
def __init__(self, stream, decoder, encoder, eol='\r\n'):
self._stream = stream
self._decoder = decoder if isinstance(decoder, codecs.IncrementalDecoder) else codecs.getincrementaldecoder(decoder)()
self._encoder = encoder if isinstance(encoder, codecs.IncrementalEncoder) else codecs.getincrementalencoder(encoder)()
self._buf = ''
self._eol = eol
self._reachedEof = False
def read(self, size=None):
r = self._stream.read(size)
raw = self._decoder.decode(r, size is None)
return self._encoder.encode(raw)
def __iter__(self):
return self
def __next__(self):
if self._reachedEof:
raise StopIteration()
while True:
line,eol,rest = self._buf.partition(self._eol)
if eol == self._eol:
self._buf = rest
return self._encoder.encode(line + eol)
raw = self._stream.read(1024)
if raw == '':
self._decoder.decode(b'', True)
self._reachedEof = True
return self._encoder.encode(self._buf)
self._buf += self._decoder.decode(raw)
next = __next__
def close(self):
return self._stream.close()
writer = open('mycsv1.csv, 'wb')
f = open('mycsv.csv','rb'):
sr = Recoder(f, 'utf-16', 'utf-8')
s = sorted(csv.reader(sr), key=operator.itemgetter(0), reverse=True))
for row in s:
print >> writer, row[0], ",", row[1], "," row[2]
The issue is this only sorts it by the month from the looks of it. Say I had entries for months 1 - 6 for years 2010, 2011, 2012.
It would sort it by the month only and not include the time or date so that I get the newest dates only. Instead I get 1/1/2010, 1/1/2011, 1/1/2012, 2/1/2010, 2/1/2011, 2/1/2012.
If I sort it in excel, it will give me the newest dates/times first which in the case if it was based on this month of June (6/1/2012, 5/1/2012, 4/1/2012, etc.) Any help on how can I accomplish this using python is most appreciated.
UPDATE
The sample data I am working with is from autorunsc after its been formatted to utf8. The data in the CSV should be as follows:
Time, Entry Location, Entry, Enabled, Category, Description, Publisher, Launch String
6/23/2011 14:23, HKLM\System\CurrentControlSet\Services, JavaQuickStarterService, enabled, Services, Prefetches JRE files for faster startup, Oracle Corporation, C:\Program Files\java, C:\Program Files\Java\jre\blah
5/25/2006 1:14,,,,,,,,,
4/4/2003 22:10,,,,,,,,,
4/17/2006 11:11,,,,,,,,
0,,,,,,,,, (Some of the entries do not have date values and are null or blank.
6/10/2013 9:30,,,,,,,,,
6/23/2013 10:25,,,,,,,,,
etc
Most of these entries have values, but I didn't want to copy and paste everything. I basically want to sort the dates from newest date/time like excel does. The lambda option someone mentioned below errors out because it is reading in "Time" from the column first. I want to figure out how to skip the first row and use lambda on the other date/time values to sort appropriately.
Upvotes: 0
Views: 3392
Reputation: 3153
You can use pandas module and to_datetime() method.
Code :
import pandas as pd
data = pd.read_csv('mycsv.csv')
data['Time'] = pd.to_datetime(data['Time'], format="%m/%d/%Y %H:%M")
data = data.sort_values(by='Time', ascending=False)
print(data.to_csv(index=False))
Input : mycsv.csv
Time, Field
6/23/2011 14:23, ABC
5/25/2006 1:14, XYZ
4/4/2003 22:10, PQR
4/17/2006 11:11,GHI
, 0
, 1
6/10/2013 9:30, 2
6/23/2013 10:25, 3
Output :
Time, Field
2013-06-23 10:25:00, 3
2013-06-10 09:30:00, 2
2011-06-23 14:23:00, ABC
2006-05-25 01:14:00, XYZ
2006-04-17 11:11:00,GHI
2003-04-04 22:10:00, PQR
, 0
, 1
Upvotes: 0
Reputation: 21473
Ok, didn't fully understand what was going on before. You problem is that your "dates" are still strings and being sorted as such. I'm guessing your date format is Month/Day/Year (US date style) as you say it's sorted by Month. All you need to do is parse your dates into datetime objects to solve your sorting problem.
# add this import at the top of your file
from datetime import datetime
# replace your current call to sorted with:
s = sorted(csv.reader(sr), key=lambda x:datetime.strptime(x[0],"%m/%d/%Y"), reverse=True))
Upvotes: 1