Reputation: 61

Extracting information

I want to extract myinformation_1 and myinformation_2 from the list below.

My code is not working yet.

Can you please help?

Thank you,HHC

import re
start = re.escape(">")
end   = re.escape("<")
stringlist =['<div class="ant-space-item"><a href="/holdings-of-1">myinformation_1</a></div>', 
    '<div class="ant-space-item"><a href="/holdings-of-2avbf">myinformation_2</a></div>']
for i in stringlist :
    result = re.search('%s(.*)%s' % (start, end), i).group(1)
    print(result)

Upvotes: 0

Answers (1)

Wizard.Ritvik

Reputation: 11612

Try with a more specific regex, e.g. <a href="/holdings-of-[^"]+">([^<]*) in this case:

import re

stringlist =['<div class="ant-space-item"><a href="/holdings-of-1">myinformation_1</a></div>',
    '<div class="ant-space-item"><a href="/holdings-of-2adf">myinformation_2</a></div>']

for i in stringlist:
    result = re.search(r'<a href="/holdings-of-[^"]+">([^<]*)', i).group(1)
    print(result)

Output:

myinformation_1
myinformation_2

Or as suggested in the comments, you can use a more "generalized" expression that works for any <a> tag, such as a regex like <a.*?>([^<]*).

Upvotes: 1

Extracting information

Answers (1)

Related Questions