Reputation: 84
I'm working on a pretty cool project but I need help. You see im collecting proxies from sslproxies.org, but sorting these proxies collected from the table into a list without extra info is pretty hard. So far my code isnt working. Hope u guys can help.What I want to do is delete the sixth item in a the list after every two.
f = open("proxies.txt", 'w+')
def getProxy():
url = "https://www.sslproxies.org"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
global tlist
tlist = []
for tr in soup.find_all('tr'):
for td in tr.find_all('td'):
tlist.append(td)
clist = tlist
count = 0
for word in clist:
count += 1
if count > 2:
clist.remove(word)
count += 1
if count >= 6:
count = 0
else:
continue
f.write(str(clist))
Upvotes: 0
Views: 89
Reputation: 61063
Here is a generator that yields two items, then skips six, then yields two more, etc
def skip_six(l):
for i, x in enumerate(l):
if i%8 <= 1:
yield x
You can use this to make a list like
clist = list(skip_six(tlist))
Upvotes: 2
Reputation: 9731
I believe you want to select first 2 columns. In this case you may want to try something like this with pandas read html. Just note that I can not access the website you mentioned. So i haven't tested this code
import pandas as pd
df=pd.read_html(io ='https://www.sslproxies.org')
print df
print df[['IP Address','Port']] # select the columns that you are interested in
Upvotes: 0