Reputation: 15

How to use lxml for web scraping?

I want to write a python script that fetches my current reputation on stack overflow --https://stackoverflow.com/users/14483205/raunanza?tab=profile

This is the code I have written.

from lxml import html 
import requests
page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile')
tree = html.fromstring(page.content)

Now, what to do to fetch my reputation. (I can't understand how to use xpath even
after googling it.)

Upvotes: 1

Answers (3)

Ananth

Reputation: 797

You need to make some modifications in your code to get the xpath. Below is the code:

from lxml import HTML 
import requests

page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile')
tree = html.fromstring(page.content) 
title = tree.xpath('//*[@id="avatar-card"]/div[2]/div/div[1]/text()')
print(title) #prints 3

You can easily get the xpath of element in chrome console(inspect option).

To learn more about xpath you can refer: https://www.w3schools.com/xml/xpath_examples.asp

Upvotes: 2

mulaixi

Reputation: 168

If you don't mind using BeautifulSoup, you can directly extract the text from the tag which contains your reputation. Of course you need to check page structure first.

from bs4 import BeautifulSoup
import requests

page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile')
soup = BeautifulSoup(page.content, features= 'lxml')

for tag in soup.find_all('strong', {'class': 'ml6 fc-medium'}):
    print(tag.text)
#this will output as 3

Upvotes: 0

Tasnuva Leeya

Reputation: 2795

Simple solution using lxml and beautifulsoup:

from lxml import html
from bs4 import BeautifulSoup
import requests
page = requests.get('https://stackoverflow.com/users/14483205/raunanza?tab=profile').text
tree = BeautifulSoup(page, 'lxml')
name = tree.find("div", {'class': 'grid--cell fw-bold'}).text
title = tree.find("div", {'class': 'grid--cell fs-title fc-dark'}).text
print("Stackoverflow reputation of {}is: {}".format(name, title))
# output: Stackoverflow reputation of Raunanza is: 3

Upvotes: 0

How to use lxml for web scraping?

Answers (3)

Related Questions