Reputation: 403
I have a list files
of strings of the following format:
files = ['/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_418000.caffemodel.h5',
'/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_502000.caffemodel.h5', ...]
I want to extract the int
between iter_
and .caffemodel
and return a list of those ints.
After some research I came up with this solution that does the trick, but I was wondering if there is a more elegant/pythonic way to do it, possibly using a list comprehension?
li = []
for f in files:
tmp = re.search('iter_[\d]+.caffemodel', f).group()
li.append(int(re.search(r'\d+', tmp).group()))
Upvotes: 1
Views: 133
Reputation: 180441
Without a regex:
files = [
'/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_418000.caffemodel.h5',
'/misc/lmbraid17/bensch/u-net-3d/2dcellnet/2dcellnet_v6w4l1/2dcellnet_v6w4l1_snapshot_iter_502000.caffemodel.h5']
print([f.rsplit("_", 1)[1].split(".", 1)[0] for f in files])
['418000', '502000']
Or if you want to be more specific:
print([f.rsplit("iter_", 1)[1].split(".caffemodel", 1)[0] for f in files])
But your pattern seems to repeat so the first solution is probably sufficient.
You can also slice using find and rfind:
print( [f[f.find("iter_")+5: f.rfind("caffe")-1] for f in files])
['418000', '502000']
Upvotes: 1
Reputation: 1445
Solution with list comprehension, as you wished:
import re
re_model_id = re.compile(r'iter_(?P<model_id>\d+).caffemodel')
li = [int(re_model_id.search(f).group('model_id')) for f in files]
Upvotes: 1
Reputation: 2144
Just to add another possible solution: join the file names together into one big string (looks like the all end with h5
, so there is no danger of creating unwanted matches) and use re.findall
on that:
import re
li = [int(d) for d in re.findall(r'iter_(\d+)\.caffemodel', ''.join(files))]
Upvotes: 3
Reputation: 158020
You can also use a lookbehind assertion:
regex = re.compile("(?<=iter_)\d+")
for f in files:
number = regex.search(f).group(0)
Upvotes: 1
Reputation: 96899
Use just:
li = []
for f in files:
tmp = int(re.search('iter_(\d+)\.caffemodel', f).group(1))
li.append(tmp)
If you put an expression into parenthesis it creates another group of matched expressions.
Upvotes: 2