Reputation: 57
I am trying to scrape a website and I want to extract a link that have this pattern: /market_information/announcements/company_announcement/announcement_details?ann_id=
Is it possible to get it done using regex? Below is my script :
title = soup.find_all('tbody')
for i in title:
for link in i.find_all('a'):
print(link['href'])
/trade/trading_resources/listing_directory/company-profile?stock_code=7374
/market_information/announcements/company_announcement/announcement_details?ann_id=393738
/trade/trading_resources/listing_directory/company-profile?stock_code=1201
/market_information/announcements/company_announcement/announcement_details?ann_id=393742
/trade/trading_resources/listing_directory/company-profile?stock_code=6874
/market_information/announcements/company_announcement/announcement_details?ann_id=393583
/trade/trading_resources/listing_directory/company-profile?stock_code=4634
/market_information/announcements/company_announcement/announcement_details?ann_id=393572
/trade/trading_resources/listing_directory/company-profile?stock_code=8176
/market_information/announcements/company_announcement/announcement_details?ann_id=393745
/trade/trading_resources/listing_directory/company-profile?stock_code=9474
/market_information/announcements/company_announcement/announcement_details?ann_id=393579
/trade/trading_resources/listing_directory/company-profile?stock_code=4561
/market_information/announcements/company_announcement/announcement_details?ann_id=393743
/trade/trading_resources/listing_directory/company-profile?stock_code=2577
/market_information/announcements/company_announcement/announcement_details?ann_id=393576
/trade/trading_resources/listing_directory/company-profile?stock_code=2984
/market_information/announcements/company_announcement/announcement_details?ann_id=393575
/trade/trading_resources/listing_directory/company-profile?stock_code=2828
/market_information/announcements/company_announcement/announcement_details?ann_id=393739
/trade/trading_resources/listing_directory/company-profile?stock_code=6874
/market_information/announcements/company_announcement/announcement_details?ann_id=393737
/trade/trading_resources/listing_directory/company-profile?stock_code=6181
/market_information/announcements/company_announcement/announcement_details?ann_id=393748
/trade/trading_resources/listing_directory/company-profile?stock_code=2984
/market_information/announcements/company_announcement/announcement_details?ann_id=393582
/trade/trading_resources/listing_directory/company-profile?stock_code=0021
/market_information/announcements/company_announcement/announcement_details?ann_id=393578
/trade/trading_resources/listing_directory/company-profile?stock_code=5028
/market_information/announcements/company_announcement/announcement_details?ann_id=393740
/trade/trading_resources/listing_directory/company-profile?stock_code=6246
/market_information/announcements/company_announcement/announcement_details?ann_id=393573
/trade/trading_resources/listing_directory/company-profile?stock_code=1201
/market_information/announcements/company_announcement/announcement_details?ann_id=393571
/trade/trading_resources/listing_directory/company-profile?stock_code=7143
/market_information/announcements/company_announcement/announcement_details?ann_id=393577
/trade/trading_resources/listing_directory/company-profile?stock_code=0091
/market_information/announcements/company_announcement/announcement_details?ann_id=393747
/trade/trading_resources/listing_directory/company-profile?stock_code=7722
/market_information/announcements/company_announcement/announcement_details?ann_id=393581
/media-releases-rss.rss
Upvotes: 0
Views: 43
Reputation: 149
Instead of using regex for something like this, it would be better to use the in
operator to check if the link contains the substring.
You can do something like:
substring = "/market_information/announcements/company_announcement/announcement_details?ann_id="
title = soup.find_all('tbody')
for i in title:
for link in i.find_all('a'):
if substring in link:
print(link) # This is the link that contained that substring
Upvotes: 0
Reputation: 1179
You can use regex, just escape the ? symbol with ? and use regex101.com to check your regex.
links = ['/trade/trading_resources/listing_directory/company-profile?stock_code=7374/market_information/announcements/company_announcement/announcement_details?ann_id=393738',
'some_other_link']
for link in links:
if re.search('/market_information/announcements/company_announcement/announcement_details\?ann_id=', link):
use_this_link = True
Upvotes: 1