gdogg371
gdogg371

Reputation: 4122

Removing carriage returns from Scrapy screen output

I am using Python.org version 2.7 64 bit on Windows Vista 64 bit to run scrapy. I am using the following to remove \n \r characters and html tags from my screen output:

body = response.xpath("//p").extract()
            body2 = str(body)
            body3 = re.sub(r'\s{2,}', ' ', body2)
            print remove_tags(body3)

This removes the HTML special characters fine, however the \r \n characters are not being removed from the final output. Am I doing something wrong?

Thanks

Upvotes: 0

Views: 903

Answers (2)

blackwind
blackwind

Reputation: 182

buddy what you need is the regex

(\\[rn]|\s){2,} 

try this out and let me know if this worked out.

Upvotes: 1

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89567

Yes, since you are not sure what type of newline the document contains you should replace your pattern with:

\s{2,}|[\r\n]

Indeed, most of the time, newlines can be CRLF (windows convention), or only LF (unix convention) (that is probably the case with you current document.) or only CR (for old apple OS)

Upvotes: 1

Related Questions