Reputation: 139
I want to Scrape all images links in this Link,I am using requests+Beautiful soup-python 3.7. My problem is that the result is 3,while there are 6 images at the page.
import requests
from bs4 import BeautifulSoup as bs
url='https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx'
var='240100160336'
payload={'rc_no':var}
headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3','Cookie':'ASP.NET_SessionId=v4kd535hn3d43z0x4ttgzqit','User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'}
res=requests.get(url,headers=headers,data=payload)
obj=bs(res.text,'html.parser')
#obj=obj.find('table')
imgs=obj.find_all('img')
print(len(imgs))
Edit: The server is using cookies to give me wanted pictures and the full html page ,so after adding cookies handling and add the correct url in my code it works as wanted !
Upvotes: 3
Views: 198
Reputation: 161
That is because in your code you only find images inside the table tag:
obj=obj.find('table')
which are 2 only.
Try searching for other images in the page as well:
import requests
from bs4 import BeautifulSoup as bs
url='https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no={};'
#var=input("Enter the variable to Bring Photos links:")
var='240100160336'
url=url.format(var)
headers={'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3','Cookie':'ASP.NET_SessionId=v4kd535hn3d43z0x4ttgzqit','User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'}
res=requests.get(url,headers=headers)
obj=bs(res.text,'html.parser')
# Search for images inside tables
objTable=obj.find('table')
imgs=objTable.find_all('img')
# Search for other images in the page
imgs2=obj.find_all('img')
print(len(imgs) + len(imgs2))
EDIT:
The URL provided in your code is not the same as the one you want to scrape.
The URL in your code is:
https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no={};
The method you used to modify the URL and append a variable to it is not helping. It prints:
https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no=240100160336;
Please look into this link for help with Parse URLs into components
The URL you linked in your post is:
https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no=240100160336
I modified your code a bit and added the correct URL:
import requests
from bs4 import BeautifulSoup as bs
url='https://ahara.kar.nic.in/FCS_report/ViewRC/dup_rc_view.aspx?rc_no=240100160336'
res=requests.get(url)
obj=bs(res.text, 'html.parser')
# Search for images in the page
imgs=obj.find_all('img')
images = []
for img in imgs:
images.append(img.get('src'))
print(images)
print(len(images))
Please see if it works now.
Upvotes: 3