Reputation: 73
import re
text = "~SR1*abcde*1234*~end~SR*abcdef*123*~end~SR11*abc*12345*~end"
I have a text that is repetitive in nature. It starts with '~SR' and ends with 'end'. i want to find the index of the 1st, 2nd, and 3rd ' * ' (asterisk) from each repetition.
def start_point(p1):
segment_start_array = []
for match in re.finditer(p1, text):
index = match.start()
segment_start_array.append(index)
return segment_start_array
def point_a(p1):
a = start_point(p1)
return a
def point_b(p2):
b = start_point(p2)
return b
def get_var_section(p1, p2):
var_list = []
for each in range(len(start_point(p1))):
list = text[point_a(p1)[each]:point_b(p2)[each]]
var_list.append(list)
return var_list
print(get_var_section('~SR', '~end'))
==> Result: ['~SR1*finda*1234*', '~SR*Findab*123*', '~SR11*findabc*12345*']
What i did first is put the repetitions into a list, which resulted into three elements. By doing this I thought it would make it easier to find the position of each asterisk, but when i tried to find the index of the 1st and 2nd asterisk the result were the same.
def test(p1, p2, occurrence):
var_list4 = []
for i in get_var_section(p1, p2):
x = i.find('*', occurrence)
var_list4.append(x)
return var_list4
print(test('~SR', '~end', 1))
print(test('~SR', '~end', 2))
==> Result: [4, 3, 5]
==> Result: [4, 3, 5]
I don't understand why the result didn't change after i changed to find the position of the 2nd occurrence.
Upvotes: 1
Views: 455
Reputation: 504
As you mentioned that the string starts and ends with (~SR1, ~end), I split the string with ~end
and then used item
to loop through the list to find indexes in the item
.
import re
text = "~SR1*abcde*1234*~end~SR*abcdef*123*~end~SR11*abc*12345*~end"
text_list = text.split('~end')
index = []
for item in text_list:
#print(item)
if len(item) > 0:
ind = [i for i, val in enumerate(item) if val == '*']
#print(ind)
index.append(ind)
index_new = np.array(index).T.tolist() #transpose of list of lists
Result
print("index")
[[4, 10, 15], [3, 10, 14], [5, 9, 15]]
print("index_new")
[[4, 3, 5], [10, 10, 9], [15, 14, 15]]
Upvotes: 4