user2292661
user2292661

Reputation: 155

How to sort a csv file column by newest time

I am trying to read in a csv file which is a result from running Autoruns autorunsc command line (i.e., autorunsc -a -m -c > mycsv.csv) and sort it by time in a way which I get the newest date/time at the top like it does in excel.

I have so far figured out how to convert it the csv file without an encoding error by reading it in as UTF16 using Recoder python class to reencode it as utf 8.

import sys
import csv
import operator
import codecs

class Recoder(object):
    def __init__(self, stream, decoder, encoder, eol='\r\n'):
        self._stream = stream
        self._decoder = decoder if isinstance(decoder, codecs.IncrementalDecoder) else codecs.getincrementaldecoder(decoder)()
        self._encoder = encoder if isinstance(encoder, codecs.IncrementalEncoder) else codecs.getincrementalencoder(encoder)()
        self._buf = ''
        self._eol = eol
        self._reachedEof = False

    def read(self, size=None):
        r = self._stream.read(size)
        raw = self._decoder.decode(r, size is None)
        return self._encoder.encode(raw)

    def __iter__(self):
        return self

    def __next__(self):
        if self._reachedEof:
            raise StopIteration()
        while True:
            line,eol,rest = self._buf.partition(self._eol)
            if eol == self._eol:
                self._buf = rest
                return self._encoder.encode(line + eol)
            raw = self._stream.read(1024)
            if raw == '':
                self._decoder.decode(b'', True)
                self._reachedEof = True
                return self._encoder.encode(self._buf)
            self._buf += self._decoder.decode(raw)
    next = __next__

    def close(self):
        return self._stream.close()


writer = open('mycsv1.csv, 'wb')
f = open('mycsv.csv','rb'):
sr = Recoder(f, 'utf-16', 'utf-8')
s = sorted(csv.reader(sr), key=operator.itemgetter(0), reverse=True))

for row in s:
    print >> writer, row[0], ",", row[1], "," row[2]

The issue is this only sorts it by the month from the looks of it. Say I had entries for months 1 - 6 for years 2010, 2011, 2012.

It would sort it by the month only and not include the time or date so that I get the newest dates only. Instead I get 1/1/2010, 1/1/2011, 1/1/2012, 2/1/2010, 2/1/2011, 2/1/2012.

If I sort it in excel, it will give me the newest dates/times first which in the case if it was based on this month of June (6/1/2012, 5/1/2012, 4/1/2012, etc.) Any help on how can I accomplish this using python is most appreciated.

UPDATE

The sample data I am working with is from autorunsc after its been formatted to utf8. The data in the CSV should be as follows:

Time, Entry Location, Entry, Enabled, Category, Description, Publisher, Launch String
6/23/2011 14:23, HKLM\System\CurrentControlSet\Services, JavaQuickStarterService, enabled, Services, Prefetches JRE files for faster startup, Oracle Corporation, C:\Program Files\java, C:\Program Files\Java\jre\blah
5/25/2006 1:14,,,,,,,,,
4/4/2003 22:10,,,,,,,,,
4/17/2006 11:11,,,,,,,,
0,,,,,,,,, (Some of the entries do not have date values and are null or blank.
6/10/2013 9:30,,,,,,,,,
6/23/2013 10:25,,,,,,,,,
etc

Most of these entries have values, but I didn't want to copy and paste everything. I basically want to sort the dates from newest date/time like excel does. The lambda option someone mentioned below errors out because it is reading in "Time" from the column first. I want to figure out how to skip the first row and use lambda on the other date/time values to sort appropriately.

Upvotes: 0

Views: 3392

Answers (2)

RAVI
RAVI

Reputation: 3153

You can use pandas module and to_datetime() method.

Code :

import pandas as pd

data = pd.read_csv('mycsv.csv')
data['Time'] = pd.to_datetime(data['Time'], format="%m/%d/%Y %H:%M")

data = data.sort_values(by='Time', ascending=False)
print(data.to_csv(index=False))

Input : mycsv.csv

Time, Field
6/23/2011 14:23, ABC
5/25/2006 1:14, XYZ
4/4/2003 22:10, PQR
4/17/2006 11:11,GHI
, 0
, 1
6/10/2013 9:30, 2
6/23/2013 10:25, 3

Output :

Time, Field
2013-06-23 10:25:00, 3
2013-06-10 09:30:00, 2
2011-06-23 14:23:00, ABC
2006-05-25 01:14:00, XYZ
2006-04-17 11:11:00,GHI
2003-04-04 22:10:00, PQR
, 0
, 1

Upvotes: 0

Endophage
Endophage

Reputation: 21473

Ok, didn't fully understand what was going on before. You problem is that your "dates" are still strings and being sorted as such. I'm guessing your date format is Month/Day/Year (US date style) as you say it's sorted by Month. All you need to do is parse your dates into datetime objects to solve your sorting problem.

# add this import at the top of your file
from datetime import datetime

# replace your current call to sorted with:
s = sorted(csv.reader(sr), key=lambda x:datetime.strptime(x[0],"%m/%d/%Y"), reverse=True))

Upvotes: 1

Related Questions