Farbod whatever
Farbod whatever

Reputation: 35

I can't web scrape the value inside a textarea tag using python

I am trying to scrape the proxy list of this site. However I can't find the the value inside the textarea tag.

Here is my code:

import requests
from bs4 import BeautifulSoup
r = requests.get("https://openproxy.space/list/azneonYD26")
soup = BeautifulSoup(r.text, "html.parser")
results = soup.find('section', class_='data')
rows =results.find('textarea')
print(rows)

Upvotes: 2

Views: 373

Answers (1)

baduker
baduker

Reputation: 20052

Actually, you can scrape that <script> tag and extract all proxy data (contry, count, all the IPs) with a bit of regex magic and some chained replace().

Here's how:

import json
import re

import requests
from bs4 import BeautifulSoup

page = requests.get("https://openproxy.space/list/azneonYD26").text
scripts = BeautifulSoup(page, "html.parser").find_all("script")
proxy_script = re.search(r"LIST\",data:(.*),code", scripts[2].string).group(1)

proxy_data = json.loads(
    (
        re.sub(r":([a-z])", r':"\1"', proxy_script)
        .replace("code", '"code"')
        .replace("count", '"count"')
        .replace("items", '"items"')
        .replace("active", '"active"')
    )
)

for proxy in proxy_data:
    print(proxy["code"], proxy["count"], proxy["items"][0])

Output:

CN 122 222.129.37.240:57114
US 82 98.188.47.132:4145
DE 51 78.46.218.20:12855
IN 15 43.224.10.37:6667
FR 9 51.195.91.196:9095
AR 8 186.126.181.223:1080
RU 7 217.28.221.10:30005
GB g 46.101.24.42:1080
SG g 8.210.163.246:50001
NL f 188.166.34.137:9000
BD 3 103.85.232.20:1080
NO d 146.59.156.73:9095
CA d 204.101.61.82:4145
BR d 179.189.226.186:8080
HK b 119.28.128.211:1080
AU b 139.99.237.180:9095
VN b 123.16.56.161:1080
KR b 125.135.221.94:54398
TH b 101.108.25.227:9999
BG b 46.10.218.194:1080
AT b 195.144.21.185:1080
VE b 200.35.79.77:1080
IE b 52.214.159.193:9080
ES b 185.66.58.142:42647
JP b 139.162.78.109:1080
UA b 46.151.197.254:8080
PL b 147.135.208.13:9095

If you want to view everything just print out the proxy_data variable.

Upvotes: 1

Related Questions