Dominic
Dominic

Reputation: 403

Extract int between two different strings in python

I have a list files of strings of the following format:

files = ['/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_418000.caffemodel.h5', 
'/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_502000.caffemodel.h5', ...]

I want to extract the int between iter_ and .caffemodel and return a list of those ints.

After some research I came up with this solution that does the trick, but I was wondering if there is a more elegant/pythonic way to do it, possibly using a list comprehension?

li = []
for f in files:
   tmp = re.search('iter_[\d]+.caffemodel', f).group()
   li.append(int(re.search(r'\d+', tmp).group()))

Upvotes: 1

Views: 133

Answers (5)

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

Without a regex:

files = [
    '/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_418000.caffemodel.h5',
    '/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_502000.caffemodel.h5']

print([f.rsplit("_", 1)[1].split(".", 1)[0] for f in files])
['418000', '502000']

Or if you want to be more specific:

print([f.rsplit("iter_", 1)[1].split(".caffemodel", 1)[0] for f in files])

But your pattern seems to repeat so the first solution is probably sufficient.

You can also slice using find and rfind:

print( [f[f.find("iter_")+5: f.rfind("caffe")-1] for f in files])
['418000', '502000']

Upvotes: 1

Alex Belyaev
Alex Belyaev

Reputation: 1445

Solution with list comprehension, as you wished:

import re

re_model_id = re.compile(r'iter_(?P<model_id>\d+).caffemodel')
li = [int(re_model_id.search(f).group('model_id')) for f in files]

Upvotes: 1

gil
gil

Reputation: 2144

Just to add another possible solution: join the file names together into one big string (looks like the all end with h5, so there is no danger of creating unwanted matches) and use re.findall on that:

import re
li = [int(d) for d in re.findall(r'iter_(\d+)\.caffemodel', ''.join(files))]

Upvotes: 3

hek2mgl
hek2mgl

Reputation: 158020

You can also use a lookbehind assertion:

regex = re.compile("(?<=iter_)\d+")

for f in files:
     number = regex.search(f).group(0)

Upvotes: 1

martin
martin

Reputation: 96899

Use just:

li = []
for f in files:
   tmp = int(re.search('iter_(\d+)\.caffemodel', f).group(1))
   li.append(tmp)

If you put an expression into parenthesis it creates another group of matched expressions.

Upvotes: 2

Related Questions