RedVelvet
RedVelvet

Reputation: 1903

Using html2text and clean some text in Python

I'm using Html2Text to convert html code into a text. Works very well, but I can't find many examples or documentation on the internet.

I'm reading users name in this way:

text_to_gain = hxs.xpath('//div[contains(@id,"yq-question-detail-profile-img")]/a/img/@alt').extract()
if text_to_gain:
        h = html2text.HTML2Text()
        h.ignore_links = True
        item['author'] = h.handle(text_to_gain[0])
else:
        item['author'] = "anonymous"

But my output is this :

u'Duncan\n\n'

It's useful have the \n when i read long text or message, but for single string or some one i want to keep only the name.

'Duncan'

Upvotes: 2

Views: 1492

Answers (2)

monklof
monklof

Reputation: 171

you can do like this too, just remove the character '\n':

>>> st = 'Duncan\n\n'
>>> st.replace('\n', '')
'Duncan'
>>> 

Upvotes: 0

JRodDynamite
JRodDynamite

Reputation: 12613

Use strip() function. This will remove all the whitespaces.

>>> a = u'Duncan\n\n'
>>> a
u'Duncan\n\n'
>>> a.strip()
u'Duncan'
>>> str(a.strip())
'Duncan'

Upvotes: 5

Related Questions