tisaconundrum
tisaconundrum

Reputation: 2292

(Python) How to convert my code to python 3 from 2.7

I'm trying to build a basic website crawler, in Python. However, the code that I've gathered from this website here is for python 2.7. I'm wondering how I can code this for python 3 or greater. I've began to try and convert it, but I keep running into errors.

import re
import urllib

textfile = open('depth_1.txt', 'wt')
print("Enter the URL you wish to crawl..")
print('Usage  - "http://phocks.org/stumble/creepy/" <-- With the double quotes')
myurl = input("@> ")
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(myurl).read(), re.I):
    print(i)
    for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(i).read(), re.I):
        print(ee)
        textfile.write(ee+'\n')
textfile.close()

Upvotes: 1

Views: 943

Answers (1)

Jeon
Jeon

Reputation: 4076

Prepare your Python2 code

Say 2.py

import re
import urllib

textfile = open('depth_1.txt', 'wt')
print("Enter the URL you wish to crawl..")
print('Usage  - "http://phocks.org/stumble/creepy/" <-- With the double quotes')
myurl = input("@> ")
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(myurl).read(), re.I):
    print(i)
    for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.urlopen(i).read(), re.I):
        print(ee)
        textfile.write(ee+'\n')
textfile.close()

Convert it with 2to3

2to3 -w 2.py

Now look into the directory with dir or ls

> dir
2016-09-24  01:53               533 2.py
2016-09-24  01:51               475 2.py.bak

2.py.bak is your original code and 2.py is Python 3 code.

See what changes have been made

import re
import urllib.request, urllib.parse, urllib.error

textfile = open('depth_1.txt', 'wt')
print("Enter the URL you wish to crawl..")
print('Usage  - "http://phocks.org/stumble/creepy/" <-- With the double quotes')
myurl = eval(input("@> "))
for i in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(myurl).read(), re.I):
    print(i)
    for ee in re.findall('''href=["'](.[^"']+)["']''', urllib.request.urlopen(i).read(), re.I):
        print(ee)
        textfile.write(ee+'\n')
textfile.close()

This works if you are using only built-ins and standard modules. In your case, it's ok.

Upvotes: 2

Related Questions