Reputation: 5885
I have the following expression
diff_pr_EUR-44_cordex_rcp45_mon_ave_2048-2060_minus_2005-2017_mon10_ave1_withsd.nc
I would like to use regex
to extract and generate the following string
rcp45_mon10
I have tried this so far with the online regex tester
rcp\d\d[^.]+mon\d+
Which extracts more than what I need...
rcp45_mon_ave_2048-2060_minus_2005-2017_mon10
How can I get regex to skip subsequent characters until it reaches the mon10
part?
Thanks
Upvotes: 3
Views: 40
Reputation: 785481
You may use re.sub
here:
>>> s = 'diff_pr_EUR-44_cordex_rcp45_mon_ave_2048-2060_minus_2005-2017_mon10_ave1_withsd.nc'
>>> print (re.sub(r'^.*?(rcp\d+).*(_mon\d+).*', r'\1\2', s))
rcp45_mon10
Details:
^.*?
: Match 0 or of any characters at the start (lazy)
(rcp\d+)
: Match and capture rcp
followed by 1+ digits in group #1
.*
: Match 0 or of any characters (greedy)
(_mon\d+)
: Match and capture _mon
followed by 1+ digits in group #2
.*
: Match anything till the end
r'\1\2'
: Replace string by back-references of group #1 and group #2
Upvotes: 2
Reputation: 51165
You can match using two matching groups, and join
:
>>> ''.join(re.findall(r'(rcp\d{2}).*?(\_mon\d{2})', s)[0])
'rcp45_mon10'
Upvotes: 2