Reputation: 11285
I'm trying to parse an AndroidManifest xml
file for some analysis. What's the optimal way to parse this? So far this is what I'm doing
import string
test=string.printable
f=open('AndroidManifest.xml', 'r').read()
x=""
for n in f:
if n in test:
x+=n
print x
And the result is more or less:
d
74Rv
vzPVZVL :Pd>P l
versionCode
minSdkVersiontargetSdkVersionnameiconlabel versionName
configChangespriorityandroid*http://schemas.android.com/apk/res/androidpackagemanifestngjvnpslnp.iplhmk1.0uses-sdkuses-permission#android.permission.READ_PHONE_STATE'android.permission.ACCESS_NETWORK_STATEandroid.permission.
That's just a portion of it. As you can see, it's pretty damn ugly. Any help would be appreciated.
EDIT:
So I get this strange traceback when I use parse
Traceback (most recent call last):
File "test2.py", line 4, in <module>
dom = parse(f)
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1914, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 928, in parse
result = builder.parseFile(file)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 0
Upvotes: 0
Views: 2147
Reputation: 9223
I would suggest parsing it using an XML parser rather than plain text.
Here's some excellent documentation on minidom.
Upvotes: 2