Reputation: 2292
I'm trying to build a basic website crawler, in Python. However, the code that I've gathered from this website here is for python 2.7. I'm wondering how I can code this for python 3 or greater. I've began to try and convert it, but I keep running into errors.
import re
import urllib
textfile = open('depth_1.txt', 'wt')
print("Enter the URL you wish to crawl..")
print('Usage - "http://phocks.org/stumble/creepy/" <-- With the double quotes')
myurl = input("@> ")
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(myurl).read(), re.I):
print(i)
for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(i).read(), re.I):
print(ee)
textfile.write(ee+'\n')
textfile.close()
Upvotes: 1
Views: 943
Reputation: 4076
Say 2.py
import re
import urllib
textfile = open('depth_1.txt', 'wt')
print("Enter the URL you wish to crawl..")
print('Usage - "http://phocks.org/stumble/creepy/" <-- With the double quotes')
myurl = input("@> ")
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(myurl).read(), re.I):
print(i)
for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(i).read(), re.I):
print(ee)
textfile.write(ee+'\n')
textfile.close()
2to3
2to3 -w 2.py
dir
or ls
> dir
2016-09-24 01:53 533 2.py
2016-09-24 01:51 475 2.py.bak
2.py.bak
is your original code and 2.py
is Python 3 code.
import re
import urllib.request, urllib.parse, urllib.error
textfile = open('depth_1.txt', 'wt')
print("Enter the URL you wish to crawl..")
print('Usage - "http://phocks.org/stumble/creepy/" <-- With the double quotes')
myurl = eval(input("@> "))
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(myurl).read(), re.I):
print(i)
for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(i).read(), re.I):
print(ee)
textfile.write(ee+'\n')
textfile.close()
This works if you are using only built-ins and standard modules. In your case, it's ok.
Upvotes: 2