Reputation: 195
I'm creating a script using Python selenium for scraping instagram user post. if user have a 62 post, I want get all of 62 post.
I tried to scroll down until all post loaded and get element/post using xpath and its works. but only 29 element/post, not all of 62 element/post.
driver.get("https://instagram.com/celmirashop/")
#scroll until all post loaded
scroll()
wait = WebDriverWait(driver, 15)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.eLAPa")))
time.sleep(30)
#getting list cards of posts
list_cards = driver.find_elements_by_xpath("//*[@class='v1Nh3 kIKUG _bz0w']")
print(len(list_cards))
if user have 62 post, I want get element of 62 (all) post
Upvotes: 1
Views: 809
Reputation: 195
when scrolling instagram, will show new 12 image, but the instagram will remove 12 passed images. I found the solution by saving 12 image when scrolling (every sroll down). so before instagram remove the passed 12 image, I have saved that images on variabel
driver.get("https://instagram.com/celmirashop/")
semua_url_lengkap = []
semua_url_post = []
nomor=1
for i in range(50):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
print(nomor)
nomor+=1
#mendapatkan list tiap cards update status
article = driver.find_element_by_tag_name("article")
list_cards = article.find_elements_by_tag_name("a")
for item in list_cards:
url_lengkap=item.get_attribute("href")
semua_url_lengkap.append(url_lengkap)
segmen = url_lengkap.rsplit('/', 2)
semua_url_post.append(segmen[1])
print(len(semua_url_post))
print(semua_url_post)
Upvotes: 1
Reputation: 788
They design the application in such a way it's hard to scrape. The elements are lazy loaded so as you scroll, some elements might disappear too.
I'd say use an xpath generic and unchanging like //a//img
because they will change the class names to something random again.
Also since you already have a method to scroll, start at the beginning. Log all elements and scroll some more and log again and scrape some more. Put on a loop till you find the end of the page element like //footer
.
Upvotes: 0