Reputation: 63
I want to get a list of urls for posts from this page and get wanted data from each of them...
import requests
from bs4 import BeautifulSoup
import selenium.webdriver as webdriver
url = 'https://www.instagram.com/louisvuitton/'
driver = webdriver.Firefox()
driver.get(url)
soup = BeautifulSoup(driver.page_source, 'lxml')
data1 = soup.find_all('div', {'class': '_cmdpi'})
list1 =[]
for links in data1:
list1.append(links.a['href'])
print list1
But why is this getting only the first link rather than a list?
Upvotes: 1
Views: 1036
Reputation: 52665
That's because there are multiple links, but only one div
with class="+cmdpi"
... So data1
is the list that consists of only one element. Try below code to get required references without using bs4
:
url = 'https://www.instagram.com/louisvuitton/'
driver = webdriver.Firefox()
driver.get(url)
links = [a.get_attribute('href') for a in driver.find_elements_by_css_selector('div._cmdpi a')]
print links
Upvotes: 1