MapleMatrix
MapleMatrix

Reputation: 109

Sort os.listdir files Python

If have downloaded several years of data stored in files with the following naming convention, year_day.dat. For example, the file named 2014_1.dat has the data for January 1, 2014. I need to read these data files ordered by day, 2014_1.dat, 2014_2.dat, 2014_3.dat until the end of the year. In the folder they are listed in that ordered BUT when I create a list of the files in the directory they are reordered 2014_1.dat, 2014_10.dat, 2014_100.dat, 2014_101.dat...2014.199.dat, 2014_2.dat. I think I need to use a sort function but how do I force it to sort the listed files by day so I can continue processing them? Here's the code so far:

import sys, os, gzip, fileinput, collections
# Set the input/output directories
wrkDir = "C:/LJBTemp"
inDir = wrkDir + "/Input"
outDir = wrkDir + "/Output"
# here we go
inList = os.listdir(inDir)  # List all the files in the 'Input' directory
print inList  #print to screen reveals 2014_1.dat.gz followed by 2014_10.dat.gz NOT    2014_2.dat.gz HELP
d = {}
for fileName in inList:     # Step through each input file 
    readFileName = inDir + "/" + fileName

    with gzip.open(readFileName, 'r') as f: #call built in utility to unzip file for reading
      for line in f:
          city, long, lat, elev, temp = line.split() #create dictionary
          d.setdefault(city, []).append(temp) #populate dictionary with city and associated temp data from each input file
          collections.OrderedDict(sorted(d.items(), key=lambda d: d[0])) # QUESTION? why doesn't this work
          #now collect and write to output file
outFileName = outDir + "/" + "1981_maxT.dat" #create output file in output directory with .dat extension
with open(outFileName, 'w') as f:
     for city, values in d.items():
        f.write('{} {}\n'.format(city, ' '.join(values)))

print "All done!!"
raw_input("Press <enter>") # this keeps the window open until you press "enter"

Upvotes: 4

Views: 11064

Answers (3)

SethMMorton
SethMMorton

Reputation: 48745

If you don't mind using third party libraries, you can use the natsort library, which was designed for exactly this situation.

import natsort
inList = natsort.natsorted(os.listdir(inDir))

This should take care of all the numerical sorting without having to worry about the details.

You can also use the ns.PATH option to make the sorting algorithm path-aware:

from natsort import natsorted, ns
inList = natsorted(os.listdir(inDir), alg=ns.PATH)

Full disclosure, I am the natsort author.

Upvotes: 3

falsetru
falsetru

Reputation: 369164

dict.items returns a list of (key, item) pair.

the key function is only using the first element (d[0] => key => city).

There's another problem: sorted returns a new copy of the list sorted, and does not sort the list inplace. Also the OrderedDict object is created and not assigned anywhere; Actually, you don't need to sort each time you append the item to the list.

Removing the ... sorted ... line, and replacing following line:

with open(outFileName, 'w') as f:
     for city, values in d.items():
        f.write('{} {}\n'.format(city, ' '.join(values)))

with following will solve your problem:

with open(outFileName, 'w') as f:
     for city, values in d.items():
        values.sort(key=lambda fn: map(int, os.path.splitext(fn)[0].split('_')))
        f.write('{} {}\n'.format(city, ' '.join(values)))

BTW, instead of manually joining hard-coded separator /, use os.path.join:

inDir + "/" + fileName

 =>

os.path.join(inDir, fileName)

Upvotes: 0

Ruthenium.
Ruthenium.

Reputation: 11

Try this if all of your files start with '2014_':

sorted(inList, key = lambda k: int(k.split('_')[1].split('.')[0]))

Otherwise take advantage of tuple comparison, sorting by the year first then the second part of your file name.

sorted(inList, key = lambda k: (int(k.split('_')[0]), int(k.split('_')[1].split('.')[0])))

Upvotes: 0

Related Questions