Trying to extract 'text' from a tag using Python

Question

I'm trying to extract the Proxy IP number on the first column in this page (https://www.proxynova.com/proxy-server-list/country-fr/), just the number for example: "178.33.62.155" but when I try to extract all the text content on the relevant tag , it doesn't get the Ip text.

The html tag on the website is:

178.33.62.155

Then I believe the Ip number above (after the tag script, inside the tag ) should appears when I print the text content, but it doesn't, following the code below I have done so far the only information that doesn't appears is the IP number.

Any idea on how to extract this specific Ip information and why it is not appearing when I extract all the text content of this tag?

from lxml import html
import requests
import re

page = requests.get('https://www.proxynova.com/proxy-server-list/country-fr/')
tree = html.fromstring(page.content.decode('utf-8'))

for elem in tree.xpath('//table[@class="table"]//tbody//td[@align="left"]'):
print elem.text_content()

tell k · Accepted Answer

I recommend using BeautifulSoup. like this.

import requests
import re
from bs4 import BeautifulSoup

res = requests.get('https://www.proxynova.com/proxy-server-list/country-fr/')
soup = BeautifulSoup(res.content, "lxml")

REGEX_JS = re.compile("^document\.write$'([^']+)'\.substr\(2$ \+ '([^']+)'\);$")

proxy_ip_list = []
for table in soup.find_all("table", id="tbl_proxy_list"):
    for script in table.find_all("script"):
        m = REGEX_JS.search(script.text)
        if m:
            proxy_ip_list.append(m.group(1)[2:] + m.group(2))

for ip in proxy_ip_list:
    print(ip)

Trying to extract 'text' from a tag using Python

Answers (2)

Related Questions

Trying to extract &#39;text&#39; from a tag using Python

Answers (2)

Related Questions

Trying to extract 'text' from a tag using Python