Reputation: 35
I am trying to scrape the proxy list of this site. However I can't find the the value inside the textarea
tag.
Here is my code:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://openproxy.space/list/azneonYD26")
soup = BeautifulSoup(r.text, "html.parser")
results = soup.find('section', class_='data')
rows =results.find('textarea')
print(rows)
Upvotes: 2
Views: 373
Reputation: 20052
Actually, you can scrape that <script>
tag and extract all proxy data (contry, count, all the IPs) with a bit of regex
magic and some chained replace()
.
Here's how:
import json
import re
import requests
from bs4 import BeautifulSoup
page = requests.get("https://openproxy.space/list/azneonYD26").text
scripts = BeautifulSoup(page, "html.parser").find_all("script")
proxy_script = re.search(r"LIST\",data:(.*),code", scripts[2].string).group(1)
proxy_data = json.loads(
(
re.sub(r":([a-z])", r':"\1"', proxy_script)
.replace("code", '"code"')
.replace("count", '"count"')
.replace("items", '"items"')
.replace("active", '"active"')
)
)
for proxy in proxy_data:
print(proxy["code"], proxy["count"], proxy["items"][0])
Output:
CN 122 222.129.37.240:57114
US 82 98.188.47.132:4145
DE 51 78.46.218.20:12855
IN 15 43.224.10.37:6667
FR 9 51.195.91.196:9095
AR 8 186.126.181.223:1080
RU 7 217.28.221.10:30005
GB g 46.101.24.42:1080
SG g 8.210.163.246:50001
NL f 188.166.34.137:9000
BD 3 103.85.232.20:1080
NO d 146.59.156.73:9095
CA d 204.101.61.82:4145
BR d 179.189.226.186:8080
HK b 119.28.128.211:1080
AU b 139.99.237.180:9095
VN b 123.16.56.161:1080
KR b 125.135.221.94:54398
TH b 101.108.25.227:9999
BG b 46.10.218.194:1080
AT b 195.144.21.185:1080
VE b 200.35.79.77:1080
IE b 52.214.159.193:9080
ES b 185.66.58.142:42647
JP b 139.162.78.109:1080
UA b 46.151.197.254:8080
PL b 147.135.208.13:9095
If you want to view everything just print out the proxy_data
variable.
Upvotes: 1