Reputation: 43
I do know, this question or similiar ones have already been asked. But the ones I found didn't provide the right answer for me so I ask here.
How can I get the text of an HTML site and which i can use to compare it to other given values?
Lets say I have this web page:
<html>
<head>
<title>This is my page</title>
<center>
<div class="mon_title">Some title here</div>
<table class="mon_list" >
<tr class='list'><th class="list" align="center"></th><th class="list" align="center">Set 1</th><th class="list" align="center">Set 2</th><th class="list" align="center">Set 4</th><th class="list" align="center">Set 5</th><th class="list" align="center">Set 6</th><th class="list" align="center">Set 7</th><th class="list" align="center">Set 8</th><th class="list" align="center">Set 9</th><th class="list" align="center">Set 10</th><th class="list" align="center">Set 11</th><th class="list" align="center">Set 12</th></tr>
<tr class='list even'><td class="list" align="center">Value 1</td><td class="list" align="center">Value 2</td><td class="list" align="center">Value 3</td><td class="list" align="center">Value 4</td><td class="list" align="center">Value 5</td><td class="list">Value 6</td><td class="list">Value 7</td><td class="list" align="center">Value 8</td><td class="list" align="center">Value 9</td><td class="list" align="center">Value 10</td><td class="list" align="center">Value 11</td><td class="list" align="center">Value 12</td></tr>
<tr class='list even'><td class="list" align="center">Value 1</td><td class="list" align="center">Value 2</td><td class="list" align="center">Value 3</td><td class="list" align="center">Value 4</td><td class="list" align="center">Value 5</td><td class="list">Value 6</td><td class="list">Value 7</td><td class="list" align="center">Value 8</td><td class="list" align="center">Value 9</td><td class="list" align="center">Value 10</td><td class="list" align="center">Value 11</td><td class="list" align="center">Value 12</td></tr>
</table>
Sorry for any typos or missing parts. I hope you get the point of the page. So now, my program should read if some given Values out of the table are the same as the given ones like "Is Value 2 somewhere in it?" and if it is actually it should ask "is Value 5 in the same row?"
Is that generally possible? How much effort would be needed to construct the program?
All i got ist the download of the actual full HTML webpage with this code in python:
import requests
url = 'http://some.random.site.com/you/ad/here'
print (requests.get(url).text)
which gives me the HTML code you see above. Instead I want that what you get when you click CTRL+A on a Website and copy+paste it into an Editor file.
PS: I'm fairly new to programming so sorry if there are any concepts i don't really get or sth like it. Also, sorry for my english I'm german...
Upvotes: 4
Views: 109
Reputation: 71471
You can use urllib
and re
to find the values:
import urllib.request
import re
data = str(urllib.request.urlopen(url).read())
values = re.findall("Value \d+", data)
Output:
['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12']
Upvotes: 2
Reputation: 128
You could use a parsing library such as beautiful soup. Your question is also answered here.
Upvotes: 2
Reputation: 6748
import requests
from bs4 import BeautifulSoup as soup
url = 'http://some.random.site.com/you/ad/here'
text=soup(requests.get(url).text)
text=text.find(class_='mon_list')
listy=[]
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
listy.append([elem.get_text() for elem in cols])
print(listy)
This will give it to you in a nested list:
[[], ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12'], ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12']]
Upvotes: 1