Reputation: 397
Say I have str = "qwop(8) 5"
and I want to return the position of 8.
I have the following solution:
import re
str = "qwop(8) 5"
regex = re.compile("\(\d\)")
match = re.search(regex, string) # match object has span = (4, 7)
print(match.span()[0] + 1) # +1 gets at the number 8 rather than the first bracket
This seems really messy. Is there a more sophisticated solution? Preferably using re
as I've already imported that for other uses.
Upvotes: 5
Views: 4169
Reputation: 61
import re
s = "qwop(8)(9) 5"
regex = re.compile("\(\d\)")
match = re.search(regex, s)
print(match.start() + 1)
start() means the start index, re.search search for the first occurrence. so this will only show the index of (8).
Upvotes: 2
Reputation: 476709
You can use:
regex = re.compile(r'\((\d+)\)')
The r
prefix means that we are working with a raw string. A raw string means that if you write for instance r'\n'
, Python will not interpret this as a string with a new line character. But as a string with two characters: a backslash ('\\'
) and an 'n'
.
The additional brackets are there to define a capture group. Furthermore a number is a sequence of (one or more) digits. So the +
makes sure that we will capture (1425)
as well.
We can then perform a .search()
and obtain a match. You then can use .start(1)
to obtain the start of the first capture group:
>>> regex.search(data)
<_sre.SRE_Match object; span=(4, 7), match='(8)'>
>>> regex.search(data).start(1)
5
If you are inteested in the content of the first capture group, you can call .group(1)
:
>>> regex.search(data).group(1)
'8'
Upvotes: 2
Reputation: 2085
Use match.start()
to get the start index of the match, and a capturing group to capture specifically the digit between the brackets to avoid the +1
in the index. If you want the very start of the pattern, use match.start()
, if you only want the digit, use match.start(1)
;
import re
test_str = 'qwop(8) 5'
pattern = r'\((\d)\)'
match = re.search(pattern, test_str)
start_index = match.start()
print('Start index:\t{}\nCharacter at index:\t{}'.format(start_index,
test_str[start_index]))
match_index = match.start(1)
print('Match index:\t{}\nCharacter at index:\t{}'.format(match_index,
test_str[match_index]))
Outputs;
Start index: 4
Character at index: (
Match index: 5
Character at index: 8
Upvotes: 5