Reputation: 11
I want scraping profile LinkedIn by URL
like ; https://www.linkedin.com/in/andrew-marson-90a74015/
i want get some data from it
I was use selenum before but I want make it more fast
so I want use request.get(url, auth=**** ) but to get data u need to login so can I get the page by auth = ('user', 'pass')
import requests
from requests.auth import HTTPBasicAuth
test = requests.get('https://www.linkedin.com/in/andrew-marson-90a74015/', auth=HTTPBasicAuth('user', 'pass'))
print(test.text)
Upvotes: 0
Views: 2285
Reputation: 136
use beautifulsoup or scrapy, both works fine for scraping for task which don't require selenium.
from requests import Session
from bs4 import BeautifulSoup as bs
with Session() as s:
site = s.get("https://www.linkedin.com/")
bs_content = bs(site.content, "html.parser")
token = bs_content.find("input", {"name": "loginCsrfParam"})["value"]
login_data = {"username": "admin",
"password": "12345", "loginCsrfParam": token}
s.post("https://www.linkedin.com/login", login_data)
home_page = s.get("https://www.linkedin.com/")
print(home_page.content)
Upvotes: 1
Reputation: 11
import requests
from bs4 import BeautifulSoup
email = ""
password = ""
client = requests.Session()
HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/uas/login'
html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
csrf = soup.find('input', {'name': 'loginCsrfParam'}).get('value')
login_information = {
'session_key': email,
'session_password': password,
'loginCsrfParam': csrf,
'trk': 'guest_homepage-basic_sign-in-submit'
}
client.post(LOGIN_URL, data=login_information)
response = client.get('https://www.linkedin.com/in/andrew-marson-90a74015/')
print(response.text)
this code not work
Upvotes: 1