Kushagra
Kushagra

Reputation: 31

How to get something from a webpage with python

This page has a url https://www.example.com

<html>
<body>
<button id="button1" onclick=func1()>
<button id="button2" onclick=func2()>
</body>
<script>
function func1(){
  open("/doubt?s=AAAB_BCCCDD");
}

function func2(){
  open("/doubt?s=AABB_CCDDEE");
}
//something like that, it is working ....
</script>
</html>

AAAB_BCCCDD and AABB_CCDDEE - both are the tokens ...

i want to get the first token in the page with python
my python code -

import requests

r = requests.get("https://www.example.com")
s = r.text

if "/doubt?s=" in s:
# After this i can' understand anything ...
# i want to get the first token here as a variable

please help me ....

Upvotes: 0

Views: 38

Answers (1)

Green 绿色
Green 绿色

Reputation: 2886

Usually, after fetching the website's raw text content, you would parse the HTML first using a library like BeautifulSoup. It will create a document object model (DOM) tree, which you then can query for the elements you need.

However, this won't read nor interpret JavaScript code. For your problem, you can use regular expressions to extract the necessary information from the raw text.

Example:

import re
import requests

r = requests.get("https://www.example.com")
s = r.text

pattern = re.compile('/doubt\\?s=(?P<token>\\w+)')
matches = pattern.findall(s)
if len(matches) > 0:
  print(matches[0])

Upvotes: 1

Related Questions