John
John

Reputation: 13699

Sorting files in a list

Let's say I have a list of files

files = ['s1.txt', 'ai1.txt', 's2.txt', 'ai3.txt']

and I need to sort them into sub-lists based off of their number so that

files = [['s1.txt', 'ai1.txt'], ['s2.txt'], ['ai3.txt']]

I could write a bunch of loops, however I am wondering if there is a better way to do this?

Upvotes: 5

Views: 351

Answers (4)

user647772
user647772

Reputation:

import itertools
import re

r_number = re.compile("^.*([0-9]+).*$")

def key_for_filename(filename):
    # Edit: This doesn't check for missing numbers.
    return r_number.match(filename).group(1)

grouped = [list(v) for k, v in
           itertools.groupby(sorted(files, key=key_for_filename),
                             key_for_filename)]

Upvotes: 4

synthesizerpatel
synthesizerpatel

Reputation: 28036

Something like this would work..

#!/usr/bin/python

from itertools import groupby
import re
import pprint

def findGroup(record):
    return re.match(".*?(\d+).txt$", record).group(1)

files = [ 's1.txt', 'ai1.txt', 's2.txt', 'ai3.txt', 'foo1.txt', 'foo54.txt' ]

results = {}
for k,g in groupby(files, findGroup):
    if not results.has_key(k):
        results[k] = []
    results[k].append([x for x in g])

pprint.pprint(results)

Note, that depending on the order, you'll get lists within lists, but you can collapse those easily enough..

Example output:

{'1': [['s1.txt', 'ai1.txt'], ['foo1.txt']],
 '2': [['s2.txt']],
 '3': [['ai3.txt']],
 '54': [['foo54.txt']]}

Upvotes: 0

NPE
NPE

Reputation: 500227

Here is a complete, working example based on defaultdict:

import re
from collections import defaultdict

files = ['s1.txt', 'ai1.txt', 's2.txt', 'ai3.txt']

def get_key(fname):
   return int(re.findall(r'\d+', fname)[0])

d = defaultdict(list)
for f in files:
   d[get_key(f)].append(f)

out = [d[k] for k in sorted(d.keys())]
print(out)

This produces:

[['s1.txt', 'ai1.txt'], ['s2.txt'], ['ai3.txt']]

Upvotes: 6

Sven Marnach
Sven Marnach

Reputation: 601479

First, write a function that extracts the number from a file name:

def file_number(name):
    return re.search(r"\d+", "s1.txt").group(0)

(Note that this function will error out if there's no number at all in the name.)

Sort the list using this function as a key:

files.sort(key=file_number)

Group by this key using itertools.groupby():

for number, group in itertools.groupby(files, file_number):
    # whatever

Upvotes: 1

Related Questions