Reputation: 109
If have downloaded several years of data stored in files with the following naming convention, year_day.dat. For example, the file named 2014_1.dat has the data for January 1, 2014. I need to read these data files ordered by day, 2014_1.dat, 2014_2.dat, 2014_3.dat until the end of the year. In the folder they are listed in that ordered BUT when I create a list of the files in the directory they are reordered 2014_1.dat, 2014_10.dat, 2014_100.dat, 2014_101.dat...2014.199.dat, 2014_2.dat. I think I need to use a sort function but how do I force it to sort the listed files by day so I can continue processing them? Here's the code so far:
import sys, os, gzip, fileinput, collections
# Set the input/output directories
wrkDir = "C:/LJBTemp"
inDir = wrkDir + "/Input"
outDir = wrkDir + "/Output"
# here we go
inList = os.listdir(inDir) # List all the files in the 'Input' directory
print inList #print to screen reveals 2014_1.dat.gz followed by 2014_10.dat.gz NOT 2014_2.dat.gz HELP
d = {}
for fileName in inList: # Step through each input file
readFileName = inDir + "/" + fileName
with gzip.open(readFileName, 'r') as f: #call built in utility to unzip file for reading
for line in f:
city, long, lat, elev, temp = line.split() #create dictionary
d.setdefault(city, []).append(temp) #populate dictionary with city and associated temp data from each input file
collections.OrderedDict(sorted(d.items(), key=lambda d: d[0])) # QUESTION? why doesn't this work
#now collect and write to output file
outFileName = outDir + "/" + "1981_maxT.dat" #create output file in output directory with .dat extension
with open(outFileName, 'w') as f:
for city, values in d.items():
f.write('{} {}\n'.format(city, ' '.join(values)))
print "All done!!"
raw_input("Press <enter>") # this keeps the window open until you press "enter"
Upvotes: 4
Views: 11064
Reputation: 48745
If you don't mind using third party libraries, you can use the natsort library, which was designed for exactly this situation.
import natsort
inList = natsort.natsorted(os.listdir(inDir))
This should take care of all the numerical sorting without having to worry about the details.
You can also use the ns.PATH
option to make the sorting algorithm path-aware:
from natsort import natsorted, ns
inList = natsorted(os.listdir(inDir), alg=ns.PATH)
Full disclosure, I am the natsort
author.
Upvotes: 3
Reputation: 369164
dict.items
returns a list of (key, item)
pair.
the key function is only using the first element (d[0]
=> key
=> city).
There's another problem: sorted
returns a new copy of the list sorted, and does not sort the list inplace. Also the OrderedDict
object is created and not assigned anywhere; Actually, you don't need to sort each time you append the item to the list.
Removing the ... sorted ...
line, and replacing following line:
with open(outFileName, 'w') as f:
for city, values in d.items():
f.write('{} {}\n'.format(city, ' '.join(values)))
with following will solve your problem:
with open(outFileName, 'w') as f:
for city, values in d.items():
values.sort(key=lambda fn: map(int, os.path.splitext(fn)[0].split('_')))
f.write('{} {}\n'.format(city, ' '.join(values)))
BTW, instead of manually joining hard-coded separator /
, use os.path.join
:
inDir + "/" + fileName
=>
os.path.join(inDir, fileName)
Upvotes: 0
Reputation: 11
Try this if all of your files start with '2014_':
sorted(inList, key = lambda k: int(k.split('_')[1].split('.')[0]))
Otherwise take advantage of tuple comparison, sorting by the year first then the second part of your file name.
sorted(inList, key = lambda k: (int(k.split('_')[0]), int(k.split('_')[1].split('.')[0])))
Upvotes: 0