Reputation: 13699
Let's say I have a list of files
files = ['s1.txt', 'ai1.txt', 's2.txt', 'ai3.txt']
and I need to sort them into sub-lists based off of their number so that
files = [['s1.txt', 'ai1.txt'], ['s2.txt'], ['ai3.txt']]
I could write a bunch of loops, however I am wondering if there is a better way to do this?
Upvotes: 5
Views: 351
Reputation:
import itertools
import re
r_number = re.compile("^.*([0-9]+).*$")
def key_for_filename(filename):
# Edit: This doesn't check for missing numbers.
return r_number.match(filename).group(1)
grouped = [list(v) for k, v in
itertools.groupby(sorted(files, key=key_for_filename),
key_for_filename)]
Upvotes: 4
Reputation: 28036
Something like this would work..
#!/usr/bin/python
from itertools import groupby
import re
import pprint
def findGroup(record):
return re.match(".*?(\d+).txt$", record).group(1)
files = [ 's1.txt', 'ai1.txt', 's2.txt', 'ai3.txt', 'foo1.txt', 'foo54.txt' ]
results = {}
for k,g in groupby(files, findGroup):
if not results.has_key(k):
results[k] = []
results[k].append([x for x in g])
pprint.pprint(results)
Note, that depending on the order, you'll get lists within lists, but you can collapse those easily enough..
Example output:
{'1': [['s1.txt', 'ai1.txt'], ['foo1.txt']],
'2': [['s2.txt']],
'3': [['ai3.txt']],
'54': [['foo54.txt']]}
Upvotes: 0
Reputation: 500227
Here is a complete, working example based on defaultdict
:
import re
from collections import defaultdict
files = ['s1.txt', 'ai1.txt', 's2.txt', 'ai3.txt']
def get_key(fname):
return int(re.findall(r'\d+', fname)[0])
d = defaultdict(list)
for f in files:
d[get_key(f)].append(f)
out = [d[k] for k in sorted(d.keys())]
print(out)
This produces:
[['s1.txt', 'ai1.txt'], ['s2.txt'], ['ai3.txt']]
Upvotes: 6
Reputation: 601479
First, write a function that extracts the number from a file name:
def file_number(name):
return re.search(r"\d+", "s1.txt").group(0)
(Note that this function will error out if there's no number at all in the name.)
Sort the list using this function as a key:
files.sort(key=file_number)
Group by this key using itertools.groupby()
:
for number, group in itertools.groupby(files, file_number):
# whatever
Upvotes: 1