Ali Malik
Ali Malik

Reputation: 85

Removing spaces and non-printable character in Python

I am working with xml file using lxml etree xpath method. My code is

from lxml import etree
File="c:\file.xml"
doc=etree.parse(File)
alltext = doc.xpath('descendant-or-self::text()')
clump = "".join(alltext)
clump

I got the following output:

             "'\n\t\n\t\t\n\t\t\n\t\t\n\t\t\n\t\n\t\n\t\t\t\n\t\n\t\t\n\t\t\t\n\t\t\t\tIntroduction\n\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\tAccessibility\n\t\t\t\n\t\t\t\n\t\t\n\t\t\n\t\t\t\n\t\t\t\tOpening eBooks\n\t\t\t\n\t\t\t\n\t\t\t\n\t\t\t\

I want to remove spaces and all tabs from output, so I use another code but failed to get the desired output
Here is that code

import string
filter(lambda x: x in string.printable, clump)

I only want to get text from output which is "Introduction , Accessibilty , Opening eBooks"

Upvotes: 0

Views: 785

Answers (2)

Sharif Mamun
Sharif Mamun

Reputation: 3554

You can try this:

''.join(clump.split())

Hope, that will solve the problem! To improve this, you can use re and I am using Sabuj's code:

>>> import re
>>> re.sub(r'[\n\t]+', ' ', clump.strip())

Upvotes: 0

Sabuj Hassan
Sabuj Hassan

Reputation: 39365

If you don't mind to do it using regex:

import re
clump = re.sub(r'[\n\t]+', ' ', clump)

If you want to put any other characters to remove, just place those inside the []

Upvotes: 3

Related Questions