Leo Lion
Leo Lion

Reputation: 43

Read the text of an web page in python

I do know, this question or similiar ones have already been asked. But the ones I found didn't provide the right answer for me so I ask here.

How can I get the text of an HTML site and which i can use to compare it to other given values?

Lets say I have this web page:

<html>
<head>
<title>This is my page</title>

<center>
<div class="mon_title">Some title here</div>
<table class="mon_list" >
<tr class='list'><th class="list" align="center"></th><th class="list" align="center">Set 1</th><th class="list" align="center">Set 2</th><th class="list" align="center">Set 4</th><th class="list" align="center">Set 5</th><th class="list" align="center">Set 6</th><th class="list" align="center">Set 7</th><th class="list" align="center">Set 8</th><th class="list" align="center">Set 9</th><th class="list" align="center">Set 10</th><th class="list" align="center">Set 11</th><th class="list" align="center">Set 12</th></tr>
<tr class='list even'><td class="list" align="center">Value 1</td><td class="list" align="center">Value 2</td><td class="list" align="center">Value 3</td><td class="list" align="center">Value 4</td><td class="list" align="center">Value 5</td><td class="list">Value 6</td><td class="list">Value 7</td><td class="list" align="center">Value 8</td><td class="list" align="center">Value 9</td><td class="list" align="center">Value 10</td><td class="list" align="center">Value 11</td><td class="list" align="center">Value 12</td></tr>
<tr class='list even'><td class="list" align="center">Value 1</td><td class="list" align="center">Value 2</td><td class="list" align="center">Value 3</td><td class="list" align="center">Value 4</td><td class="list" align="center">Value 5</td><td class="list">Value 6</td><td class="list">Value 7</td><td class="list" align="center">Value 8</td><td class="list" align="center">Value 9</td><td class="list" align="center">Value 10</td><td class="list" align="center">Value 11</td><td class="list" align="center">Value 12</td></tr>
</table>

Sorry for any typos or missing parts. I hope you get the point of the page. So now, my program should read if some given Values out of the table are the same as the given ones like "Is Value 2 somewhere in it?" and if it is actually it should ask "is Value 5 in the same row?"

Is that generally possible? How much effort would be needed to construct the program?

All i got ist the download of the actual full HTML webpage with this code in python:

import requests

url = 'http://some.random.site.com/you/ad/here'
print (requests.get(url).text)

which gives me the HTML code you see above. Instead I want that what you get when you click CTRL+A on a Website and copy+paste it into an Editor file.

PS: I'm fairly new to programming so sorry if there are any concepts i don't really get or sth like it. Also, sorry for my english I'm german...

Upvotes: 4

Views: 109

Answers (3)

Ajax1234
Ajax1234

Reputation: 71471

You can use urllib and re to find the values:

import urllib.request
import re

data = str(urllib.request.urlopen(url).read())

values = re.findall("Value \d+", data)

Output:

['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12']

Upvotes: 2

silvanoe
silvanoe

Reputation: 128

You could use a parsing library such as beautiful soup. Your question is also answered here.

Upvotes: 2

whackamadoodle3000
whackamadoodle3000

Reputation: 6748

import requests
from bs4 import BeautifulSoup as soup
url = 'http://some.random.site.com/you/ad/here'
text=soup(requests.get(url).text)
text=text.find(class_='mon_list')
listy=[]
rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    listy.append([elem.get_text() for elem in cols])
print(listy)

This will give it to you in a nested list:

[[], ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12'], ['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12']]

Upvotes: 1

Related Questions