imhans4305
imhans4305

Reputation: 697

Start and End Position of symbols in a string

I am trying to find the start and end position of _ in a string as list of tuples.

The code I used is

sentence = 'special events _______ ______ ___ _______ ____ _____ _______ ___________ brochure subscriptions ticket guide'
symbol = '_'

position = [(match.start(),match.end()) for match in re.finditer(symbol, sentence)]

For this the output obtained is

[(15, 16), (16, 17), (17, 18), (18, 19), (19, 20)..................]

How to get the start and end position of continuous located symbols as a list of tuple.

Upvotes: 1

Views: 728

Answers (2)

constantstranger
constantstranger

Reputation: 9379

You can do this:

sentence2 = ' ' + sentence[:-1]
starts = [i for i in range(len(sentence))if sentence[i] == '_' and sentence2[i] != '_' ]
ends = [i - 1 for i in range(len(sentence)) if sentence2[i] == '_' and sentence[i] != '_']
pairs = list(zip(starts, ends))
print(pairs)

Output:

[(15, 21), (23, 28), (30, 32), (34, 40), (42, 45), (47, 51), (53, 59), (61, 71)]

This will give the index of the first and last instances of symbol in a substring of one or more contiguous symbol characters. If you need results that use python slice semantics (start == index of first instance of symbol in a contiguous substring, end == index immediately following the last instance of symbol in that substring), you can change i - 1 to i in the initialization line for ends.

Upvotes: 1

Matthias
Matthias

Reputation: 13232

You should add the + quantifier. And since symbol could be a special symbol for the regular expression you might want to escape it with re.escape.

import re

sentence = 'special events _______ ______ ___ _______ ____ _____ _______ ___________ brochure subscriptions ticket guide'
symbol = '_'

needle = f'{re.escape(symbol)}+'
position = [(match.start(),match.end()) for match in re.finditer(needle, sentence)]
print(position)

The result is [(15, 22), (23, 29), (30, 33), (34, 41), (42, 46), (47, 52), (53, 60), (61, 72)].

Please be aware that end is the position after the match as stated in the documentation for Match.end.

Upvotes: 2

Related Questions