Reputation: 61
I want to extract myinformation_1
and myinformation_2
from the list below.
My code is not working yet.
Can you please help?
Thank you,HHC
import re
start = re.escape(">")
end = re.escape("<")
stringlist =['<div class="ant-space-item"><a href="/holdings-of-1">myinformation_1</a></div>',
'<div class="ant-space-item"><a href="/holdings-of-2avbf">myinformation_2</a></div>']
for i in stringlist :
result = re.search('%s(.*)%s' % (start, end), i).group(1)
print(result)
Upvotes: 0
Views: 57
Reputation: 11612
Try with a more specific regex, e.g. <a href="/holdings-of-[^"]+">([^<]*)
in this case:
import re
stringlist =['<div class="ant-space-item"><a href="/holdings-of-1">myinformation_1</a></div>',
'<div class="ant-space-item"><a href="/holdings-of-2adf">myinformation_2</a></div>']
for i in stringlist:
result = re.search(r'<a href="/holdings-of-[^"]+">([^<]*)', i).group(1)
print(result)
Output:
myinformation_1
myinformation_2
Or as suggested in the comments, you can use a more "generalized" expression that works for any <a>
tag, such as a regex like <a.*?>([^<]*)
.
Upvotes: 1