Reputation: 93
I am working on Scraping a JS rendered page ( https://www.flipkart.com/search?q=Acer+Laptops ). In this page the product images are being loaded dynamically. The pre-rendered SRC values for these images is
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg
After rendering, the SRC should be something like this
https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70
Using requests_html I can get the SRC values BUT it is only working for the first few images at the top. Please help me out here? My code :-
res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")
res.html.render()
all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results
items = all_results.find('._1UoZlX') # Container for each product being displayed
for item in items:
item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')
print(item_image)
Output:-
https://rukminim1.flixcart.com/image/312/312/kamtsi80/computer/m/8/y/acer-na-gaming-laptop-original-imafs5prytwgrcyf.jpeg?q=70
https://rukminim1.flixcart.com/image/312/312/kcp4osw0/computer/f/w/d/acer-na-thin-and-light-laptop-original-imaftrdmuyxq5nrf.jpeg?q=70
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg
//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg
As you can see the first two images are loaded, the rest are not. Thank you all in advance!
Upvotes: 0
Views: 203
Reputation: 93
I found the solution, as the images were lazily loaded I had to use "scrolldown" and "sleep" parameters in the "render()" function. Find the code below:
res = session.get("https://www.flipkart.com/search?q=Acer+Laptops")
res.html.render(scrolldown=20, sleep=.1)
all_results = res.html.find('#container > div > div.t-0M7P._2doH3V > div._3e7xtJ > div._1HmYoV.hCUpcT > div:nth-child(2)', first=True) #Container for all the results
items = all_results.find('._1UoZlX') # Container for each product being displayed
for item in items:
item_image = item.find('div._3BTv9X img', first=True).attrs.get('src')
print(item_image)
Upvotes: 0
Reputation: 11515
import requests
import re
def main(url):
r = requests.get(url)
match = [x.group(1) for x in re.finditer(
'dynamicImageUrl":"(.*?)"', r.text)]
print(match)
main("https://www.flipkart.com/search?q=Acer+Laptops")
Output:
['http://rukmini1.flixcart.com/flap/{@width}/{@height}/image/c9ef9eae08a3b038.jpg?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}', 'https://rukminim1.flixcart.com/www/{@width}/{@height}/promos/21/07/2017/e8625e14-3277-4f16-a4d4-df8ed525905b.png?q={@quality}']
Now you can replace width, height and quality according to your needs.
Defaults is 312 x 312 x 70
Upvotes: 1