salimfh
salimfh

Reputation: 11

scraping linkedin profile without selenium and API

I want scraping profile LinkedIn by URL

like ; https://www.linkedin.com/in/andrew-marson-90a74015/

i want get some data from it

I was use selenum before but I want make it more fast

so I want use request.get(url, auth=**** ) but to get data u need to login so can I get the page by auth = ('user', 'pass')

import requests
from requests.auth import HTTPBasicAuth
test = requests.get('https://www.linkedin.com/in/andrew-marson-90a74015/', auth=HTTPBasicAuth('user', 'pass'))
print(test.text)

Upvotes: 0

Views: 2285

Answers (2)

Gulshan Yadav
Gulshan Yadav

Reputation: 136

use beautifulsoup or scrapy, both works fine for scraping for task which don't require selenium.

from requests import Session
from bs4 import BeautifulSoup as bs

with Session() as s:
    site = s.get("https://www.linkedin.com/")
    bs_content = bs(site.content, "html.parser")
    token = bs_content.find("input", {"name": "loginCsrfParam"})["value"]
    login_data = {"username": "admin",
                  "password": "12345", "loginCsrfParam": token}
    s.post("https://www.linkedin.com/login", login_data)
    home_page = s.get("https://www.linkedin.com/")
    print(home_page.content)

Upvotes: 1

salimfh
salimfh

Reputation: 11

import requests
from bs4 import BeautifulSoup

email = ""
password = ""

client = requests.Session()

HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/uas/login'

html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find('input', {'name': 'loginCsrfParam'}).get('value')

login_information = {
    'session_key': email,
    'session_password': password,
    'loginCsrfParam': csrf,
    'trk': 'guest_homepage-basic_sign-in-submit'
}

client.post(LOGIN_URL, data=login_information)

response = client.get('https://www.linkedin.com/in/andrew-marson-90a74015/')
print(response.text)

this code not work

Upvotes: 1

Related Questions