Niranga Sithara
Niranga Sithara

Reputation: 63

Scraping Instagram from BeautifulSoup

I want to get a list of urls for posts from this page and get wanted data from each of them...

import requests
from bs4 import BeautifulSoup    
import selenium.webdriver as webdriver    
url = 'https://www.instagram.com/louisvuitton/'
driver = webdriver.Firefox()
driver.get(url)    
soup = BeautifulSoup(driver.page_source, 'lxml')
data1 = soup.find_all('div', {'class': '_cmdpi'})
list1 =[]
for links in data1:
    list1.append(links.a['href'])
print list1

But why is this getting only the first link rather than a list?

Upvotes: 1

Views: 1036

Answers (1)

Andersson
Andersson

Reputation: 52665

That's because there are multiple links, but only one div with class="+cmdpi"... So data1 is the list that consists of only one element. Try below code to get required references without using bs4:

url = 'https://www.instagram.com/louisvuitton/'
driver = webdriver.Firefox()
driver.get(url) 
links = [a.get_attribute('href') for a in driver.find_elements_by_css_selector('div._cmdpi a')]
print links

Upvotes: 1

Related Questions