mkohler
mkohler

Reputation: 159

Split list into sub-lists based on integer in string

I have a list of strings as such:

['text_1.jpg', 'othertext_1.jpg', 'text_2.jpg', 'othertext_2.jpg', ...]

In reality, there are more entries than 2 per number but this is the general format. I would like to split this list into list of lists as such:

[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg'], ...]

These sub-lists being based on the integer after the underscore. My current method to do so is to first sort the list based on the numbers as shown in the first list sample above and then iterate through each index and copy the values into new lists if it matches the value of the previous integer.

I am wondering if there is a simpler more pythonic way of performing this task.

Upvotes: 1

Views: 226

Answers (2)

Lei Yang
Lei Yang

Reputation: 4335

Similiar solution to @Andrej:

import itertools
import re


def find_number(s):
    # it is said that python will compile regex automatically
    # feel free to compile first
    return re.search(r'_(\d+)\.jpg', s).group(1)


l = ['text_1.jpg', 'othertext_1.jpg', 'text_2.jpg', 'othertext_2.jpg']
res = [list(v) for k, v in itertools.groupby(l, find_number)]
print(res)
#[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg']]

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195543

Try:

import re

lst = ["text_1.jpg", "othertext_1.jpg", "text_2.jpg", "othertext_2.jpg"]

r = re.compile(r"_(\d+)\.jpg")
out = {}
for val in lst:
    num = r.search(val).group(1)
    out.setdefault(num, []).append(val)

print(list(out.values()))

Prints:

[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg']]

Upvotes: 2

Related Questions