Reputation: 35
Python 2.7
I want to open a file with my operating system's (Win 7) default application set for given file type. I am using the method os.startfile.
Problem is related to character encodings, I have spent hours but haven't found a solution.
# -*- coding: utf-8 -*-
import xml.etree.cElementTree as etree
import os
class Session:
'Session'
def __init__(self, xmlfile):
parser = etree.XMLParser(encoding="utf-8")
self.tree = etree.parse(xmlfile, parser=parser)
self.root = self.tree.getroot()
def get_documents(self):
return self.root.findall('document')
session = Session('sessionutf8.xml')
for doc in session.get_documents():
print doc.text.encode('utf-8')
os.startfile(doc.text.encode('iso 8859-1'))
The input XML:
<?xml version="1.0" encoding="utf-8"?>
<session>
<name> Statistikk </name>
<document>
C:\Users\Jens\Documents\Vår 2014\TMA4245 Statistikk\Probability & Statistics for Engineers & Scientists (9th Edition) - Walpole.pdf
</document>
The output:
%USERPROFILE%\Documents\My Python scripts\> python session.py
C:\Users\Jens\Documents\Vår 2014\TMA4245 Statistikk\Probability & Statistics for Engineers & Scientists (9th Edition) - Walpole.pdf
Traceback (most recent call last):
File "session.py", line 19, in <module>
os.startfile(doc.text.encode('iso 8859-1'))
WindowsError: [Error 2] The system cannot find the file specified: '\n\t\tC:\\Users\\Jens\\Documents\\V\xe5r 2014\\TMA4245 Statistikk\\Probability & Statistics for Engineers & Scientists (9th Edition) - Walpole.pdf\n\t'
Process python exited with code 1
So I can output the name of the file containing the character 'å' correctly in the console, but am unable to pass it in a way that is accepted by windows.
What makes it even more confusing is that the following code works:
book = u'C:\\Users\Jens\Documents\Vår 2014\TMA4245 Statistikk\Probability & Statistics for Engineers & Scientists (9th Edition) - Walpole.pdf'
os.startfile(book.encode('iso 8859-1'))
This code opens the pdf document in Adobe Reader as expected (well..I'm not really expecting anything after writing a line of code now, mostly hoping and praying).
So I've tried all kinds of combinations ISO 8859-1, UTF-8 in encoding() and XML-file. Been trying to read up on these things, but I'm still confused.
Note that this is my first Python-program ever, I have programmed in Java a few years. But there may be stuff here I shouldn't do, so feel free to suggest other ways of achieving my goal. That is, to open a file in whatever application is set as default in my os, and then return to my program. No references to the new process or stuff like that needed. Just open the document and move on.
Upvotes: 2
Views: 3069
Reputation: 5202
WindowsError: [Error 2] The system cannot find the file specified: '\n\t\tC:\\Users\\Jens\\Documents\\V\xe5r 2014\\TMA4245 Statistikk\\Probability & Statistics for Engineers & Scientists (9th Edition) - Walpole.pdf\n\t'
As you can see here, you have trailing whitespace characters in your variable -- you can remove these with strip(). They come from the fact that you used newlines and indentation inside your node.
os.startfile(doc.text.strip())
Upvotes: 0
Reputation: 76244
From your error:
cannot find the file specified: '\n\t\tC:\\Users...
Note the \n\t\t. Looks like the whitespace preceding and following your path name is preserved when it's pulled from the xml. You ought to strip it out.
os.startfile(doc.text.strip().encode('iso 8859-1'))
Upvotes: 2