Reputation: 7879
I'm trying to create and write to a file. I have the following code:
from urllib2 import urlopen
def crawler(seed_url):
to_crawl = [seed_url]
crawled=[]
while to_crawl:
page = to_crawl.pop()
page_source = urlopen(page)
s = page_source.read()
with open(str(page)+".txt","a+") as f:
f.write(s)
f.close()
return crawled
if __name__ == "__main__":
crawler('http://www.yelp.com/')
However, it returns the error:
Traceback (most recent call last):
File "/Users/adamg/PycharmProjects/NLP-HW1/scrape-test.py", line 29, in <module>
crawler('http://www.yelp.com/')
File "/Users/adamg/PycharmProjects/NLP-HW1/scrape-test.py", line 14, in crawler
with open("./"+str(page)+".txt","a+") as f:
IOError: [Errno 2] No such file or directory: 'http://www.yelp.com/.txt'
I thought that open(file,"a+")
is supposed to create and write. What am I doing wrong?
Upvotes: 0
Views: 43
Reputation: 15336
If you want to use the URL as the basis for the directory, you should encode the URL. That way, slashes (among other characters) will be converted to character sequences which won't interfere with the file system/shell.
The urllib
library can help with this.
So, for example:
>>> import urllib
>>> urllib.quote_plus('http://www.yelp.com/')
'http%3A%2F%2Fwww.yelp.com%2F'
Upvotes: 5