HappySpaceBoy
HappySpaceBoy

Reputation: 21

401 Error when Webscraping LinkedIn with BeautifulSoup

I am trying to use Python's BeautifulSoup library to extract HTML from my LinkedIn "Recently Added Connections" Page. Specifically, I want the name of the most recent connection - it appears towards the top of the page.

When I inspect the HTML for this specific section, what I see wrapping the content is:

<span class="mn-connection-card__name t-16 t-black t-bold">
      Bob McBobface
    </span>

However, the HTML I get back with BeautifulSoup is disappointing:

{"request":"/voyager/api/configuration","status":200,"body":"bpr-guid-3322365"}

{"status":401}

I've tried fiddling with the Requests library, but to no avail. I'm a beginner, so I'm hoping I don't need to spend a few weeks learning about OAuth or Selenium.

Here's my code:

from bs4 import BeautifulSoup
import urllib.request

url = "https://www.linkedin.com/mynetwork/invite-connect/connections/"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)
content_list = soup.find_all('span',class_="mn-connection-card__name t-16 t-black t-bold")
print(content_list)

Running this returns an empty list: [], whereas I would expect: "Bob McBobface".

When I print(soup), it just returns a short HTML blurb with the 401-Error notice you see above.

Any advice?

Upvotes: 1

Views: 1564

Answers (1)

Erik Overflow
Erik Overflow

Reputation: 2306

LinkedIn requires you to be logged in to access that page. It does not look like you're adding any authentication to your call. 401 is typically an authentication error, so that would line up here.

This question answers how to authenticate properly with LinkedIn

Upvotes: 1

Related Questions