Reputation: 212
I'm trying to save a bunch of pages in a folder next to the py file that creates them. I'm on windows so when I try to make the trailing backslash for the file-path it makes a special character instead.
Here's what I'm talking about:
from bs4 import BeautifulSoup
import urllib2, urllib
import csv
import requests
from os.path import expanduser
print "yes"
with open('intjpages.csv', 'rb') as csvfile:
pagereader = csv.reader(open("intjpages.csv","rb"))
i=0
for row in pagereader:
print row
agentheader = {'User-Agent': 'Nerd'}
request = urllib2.Request(row[0],headers=agentheader)
url = urllib2.urlopen(request)
soup = BeautifulSoup(url)
for div in soup.findAll('div', {"class" : "side"}):
div.extract()
body = soup.find_all("div", { "class" : "md" })
name = "page" + str(i) + ".html"
path_to_file = "\cleanishdata\"
outfile = open(path_to_file + name, 'w')
#outfile = open(name,'w') #this works fine
body=str(body)
outfile.write(body)
outfile.close()
i+=1
I can save the files to the same folder that the .py file is in, but when I process the files using rapidminer it includes the program too. Also it would just be neater if I could save it in a directory.
I am surprised this hasn't already been answered on the entire internet.
EDIT: Thanks so much! I ended up using information from both of your answers. IDLE was making me use r'\string\' to concatenate the strings with the backslashes. I needed use the path_to_script technique of abamert to solve the problem of creating a new folder wherever the py file is. Thanks again! Here's the relevant coding changes:
name = "page" + str(i) + ".txt"
path_to_script_dir = os.path.dirname(os.path.abspath("links.py"))
newpath = path_to_script_dir + r'\\' + 'cleanishdata'
if not os.path.exists(newpath): os.makedirs(newpath)
outfile = open(path_to_script_dir + r'\\cleanishdata\\' + name, 'w')
body=str(body)
outfile.write(body)
outfile.close()
i+=1
Upvotes: 0
Views: 1190
Reputation: 10579
Are you sure sure you're escaping your backslashes properly?
The \"
in your string "\cleanishdata\"
is actually an escaped quote character ("
).
You probably want
r"\cleanishdata\"
or
"\\cleanishdata\\"
You probably also want to check out the os.path
library, particular os.path.join
and os.path.dirname
.
For example, if your file is in C:\Base\myfile.py
and you want to save files to C:\Base\cleanishdata\output.txt
, you'd want:
os.path.join(
os.path.dirname(os.path.abspath(sys.argv[0])), # C:\Base\
'cleanishdata',
'output.txt')
Upvotes: 3
Reputation: 365747
A better solution than hardcoding the path to the .py
file is to just ask Python for it:
import sys
import os
path_to_script = sys.argv[0]
path_to_script_dir = os.path.dirname(os.path.abspath(path_to_script))
Also, it's generally better to use os.path
methods instead of string manipulation:
outfile = open(os.path.join(path_to_script_dir, name), 'w')
Besides making your program continue to work as expected even if you move it to a different location or install it on another machine or give it to a friend, getting rid of the hardcoded paths and the string-based path concatenation means you don't have to worry about backslashes anywhere, and this problem never arises in the first place.
Upvotes: 2