Reputation: 21
I am trying to use Python's BeautifulSoup library to extract HTML from my LinkedIn "Recently Added Connections" Page. Specifically, I want the name of the most recent connection - it appears towards the top of the page.
When I inspect the HTML for this specific section, what I see wrapping the content is:
<span class="mn-connection-card__name t-16 t-black t-bold">
Bob McBobface
</span>
However, the HTML I get back with BeautifulSoup is disappointing:
{"request":"/voyager/api/configuration","status":200,"body":"bpr-guid-3322365"}
{"status":401}
I've tried fiddling with the Requests library, but to no avail. I'm a beginner, so I'm hoping I don't need to spend a few weeks learning about OAuth or Selenium.
Here's my code:
from bs4 import BeautifulSoup
import urllib.request
url = "https://www.linkedin.com/mynetwork/invite-connect/connections/"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)
content_list = soup.find_all('span',class_="mn-connection-card__name t-16 t-black t-bold")
print(content_list)
Running this returns an empty list: [], whereas I would expect: "Bob McBobface".
When I print(soup)
, it just returns a short HTML blurb with the 401-Error notice you see above.
Any advice?
Upvotes: 1
Views: 1564
Reputation: 2306
LinkedIn requires you to be logged in to access that page. It does not look like you're adding any authentication to your call. 401 is typically an authentication error, so that would line up here.
This question answers how to authenticate properly with LinkedIn
Upvotes: 1