Reputation: 23
I am trying to scrape all match reports links from the page but there is 'load more' button, and I don't want to use selenium. Is there any solution to collect all links without selenium. Thanks in advance.
Here what I tried:
from bs4 import BeautifulSoup as bs
import requests
r=requests.get('https://www.iplt20.com/news/match-reports')
soup = bs(r.text,'lxml')
for match in soup.find_all('div',class_='latest-slider-wrap
position-relative'):
links = match.find('a')
print(links['href'])
Upvotes: 2
Views: 43
Reputation: 195553
Try:
import requests
from bs4 import BeautifulSoup
url = "https://www.iplt20.com/news/match-reports"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select("#div-match-report a:has(li)"):
print(a["href"])
Prints:
https://www.iplt20.com/news/4014/tata-ipl-2024-match-11-lsg-vs-pbks-match-report
https://www.iplt20.com/news/4012/tata-ipl-2024-match-10-rcb-vs-kkr-match-report
https://www.iplt20.com/news/4011/tata-ipl-2024-match-09-rr-vs-dc-match-report
https://www.iplt20.com/news/4009/tata-ipl-2024-match-08-srh-vs-mi-match-report
https://www.iplt20.com/news/4007/tata-ipl-2024-match-07-csk-vs-gt-match-report
https://www.iplt20.com/news/4006/tata-ipl-2024-match-06-rcb-vs-pbks-match-report
https://www.iplt20.com/news/4004/tata-ipl-2024-match-05-gt-vs-mi-match-report
https://www.iplt20.com/news/4003/tata-ipl-2024-match-04-rr-vs-lsg-match-report
https://www.iplt20.com/news/4001/tata-ipl-2024-match-03-kkr-vs-srh-match-report
https://www.iplt20.com/news/4000/tata-ipl-2024-match-02-pbks-vs-dc-match-report
https://www.iplt20.com/news/3999/tata-ipl-2024-match-01-csk-vs-rcb-match-report
https://www.iplt20.com/news/3976/tata-ipl-2023-final-csk-vs-gt-match-report
https://www.iplt20.com/news/3974/tata-ipl-2023-qualifier-2-gt-vs-mi-match-report
https://www.iplt20.com/news/3973/tata-ipl-2023-eliminator-lsg-vs-mi-match-report
https://www.iplt20.com/news/3972/tata-ipl-2023-qualifier-1-gt-vs-csk-match-report
https://www.iplt20.com/news/3971/tata-ipl-2023-match-70-rcb-vs-gt-match-report
https://www.iplt20.com/news/3970/tata-ipl-2023-match-69-mi-vs-srh-match-report
https://www.iplt20.com/news/3969/tata-ipl-2023-match-68-kkr-vs-lsg-match-report
https://www.iplt20.com/news/3968/tata-ipl-2023-match-67-dc-vs-csk-match-report
https://www.iplt20.com/news/3967/tata-ipl-2023-match-66-pbks-vs-rr-match-report
https://www.iplt20.com/news/3966/tata-ipl-2023-match-65-srh-vs-rcb-match-report
EDIT: To get all links you can use their Ajax pagination API:
import requests
api_url = "https://www.iplt20.com/add-more-match-report?page={page}&type=match-reports"
for page in range(1, 4): # <-- adjust number of pages here
print(f"{page=}")
data = requests.get(api_url.format(page=page)).json()
for d in data["newsResponce"]["data"]:
print(f'https://www.iplt20.com/news/{d["id"]}/{d["titleUrlSegment"]}')
Prints:
...
page=2
https://www.iplt20.com/news/3964/tata-ipl-2023-match-64-pbks-vs-dc-match-report
https://www.iplt20.com/news/3963/tata-ipl-2023-match-63-lsg-vs-mi-match-report
https://www.iplt20.com/news/3962/tata-ipl-2023-match-62-gt-vs-srh-match-report
https://www.iplt20.com/news/3960/tata-ipl-2023-match-61-csk-vs-kkr-match-report
https://www.iplt20.com/news/3959/tata-ipl-2023-match-60-rr-vs-rcb-match-report
https://www.iplt20.com/news/3958/tata-ipl-2023-match-59-dc-vs-pbks-match-report
https://www.iplt20.com/news/3956/tata-ipl-2023-match-58-srh-vs-lsg-match-report
https://www.iplt20.com/news/3955/tata-ipl-2023-match-57-mi-vs-gt-match-report
https://www.iplt20.com/news/3953/tata-ipl-2023-match-56-kkr-vs-rr-match-report
https://www.iplt20.com/news/3952/tata-ipl-2023-match-55-csk-vs-dc-match-report
https://www.iplt20.com/news/3951/tata-ipl-2023-match-54-mi-vs-rcb-match-report
https://www.iplt20.com/news/3947/tata-ipl-2023-match-53-kkr-vs-pbks-match-report
https://www.iplt20.com/news/3946/tata-ipl-2023-match-52-rr-vs-srh-match-report
https://www.iplt20.com/news/3945/tata-ipl-2023-match-51-gt-vs-lsg-match-report
https://www.iplt20.com/news/3944/tata-ipl-2023-match-50-dc-vs-rcb-match-report
https://www.iplt20.com/news/3943/tata-ipl-2023-match-49-csk-vs-mi-match-report
https://www.iplt20.com/news/3942/tata-ipl-2023-match-48-rr-vs-gt-match-report
https://www.iplt20.com/news/3940/tata-ipl-2023-match-47-srh-vs-kkr-match-report
https://www.iplt20.com/news/3938/tata-ipl-2023-match-46-pbks-vs-mi-match-report
https://www.iplt20.com/news/3937/tata-ipl-2023-match-45-lsg-vs-csk-match-report
https://www.iplt20.com/news/3936/tata-ipl-2023-match-44-gt-vs-dc-match-report
page=3
https://www.iplt20.com/news/3934/tata-ipl-2023-match-43-lsg-vs-rcb-match-report
https://www.iplt20.com/news/3932/tata-ipl-2023-match-42-mi-vs-rr-match-report
https://www.iplt20.com/news/3931/tata-ipl-2023-match-41-csk-vs-pbks-match-report
https://www.iplt20.com/news/3930/tata-ipl-2023-match-40-dc-vs-srh-match-report
...
Upvotes: 3