Reputation: 1175
I'm trying to keep linebreaks reading from a txt file when I print the content into an HTML one.
I get results from boilerpipe in this way:
class BottomPipeResult :
AGENT_ID = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=LargestContentExtractor&output=text"
#BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=ArticleExtractor&output=htmlFragment"
_myBPPage = ""
# scrape and get results from bottompipe
def scrapeResult(self, theURL, user_agent=AGENT_ID) :
request = urllib2.Request(self.BOTTOMPIPE_URL.format(theURL))
if user_agent:
request.add_header("User-Agent", user_agent)
pagefile = urllib2.urlopen(request)
realurl = pagefile.geturl()
f = pagefile
self._myBPPAge = f.read()
return(self._myBPPAge)
but when I reprint them to html I loose all the linebreaks.
Here's the code I use to write into HTML
f = open('./../../entries-new.html', 'a')
f.write(BottomPipeResult.scrapeResult(myLinkResult))
f.close()
Here an example of booilerpipe text result:
http://boilerpipe-web.appspot.com/extract?url=http%3A%2F%2Fresult.com&extractor=ArticleExtractor&output=text
i tried this but it doesn't work:
myLinkResult = re.sub('\n','<br />', myLinkResult)
Any suggestion?
Thanks
Upvotes: 1
Views: 363
Reputation: 56
You could wrap the text in a <pre> tag. This tells the HTML that the text is pre-formatted.
eg:
<pre>Your text
With line feeds
and other things
</pre>
Upvotes: 1
Reputation: 2629
I modified your code just a touch so it was runnable and it seems to "work" properly for me. The resulting output has line endings where expected. I'm seeing some encoding issues, but no line ending issues.
import urllib2
class BottomPipeResult :
AGENT_ID = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
BOTTOMPIPE_URL = "http://boilerpipe-web.appspot.com/extract?url={0}&extractor=LargestContentExtractor&output=text"
_myBPPage = ""
# scrape and get results from bottompipe
def scrapeResult(self, theURL, user_agent=AGENT_ID) :
request = urllib2.Request(self.BOTTOMPIPE_URL.format(theURL))
if user_agent:
request.add_header("User-Agent", user_agent)
pagefile = urllib2.urlopen(request)
realurl = pagefile.geturl()
f = pagefile
self._myBPPAge = f.read()
return(self._myBPPAge)
bpr = BottomPipeResult()
myLinkResult = 'http://result.com'
f = open('out.html', 'a')
f.write(bpr.scrapeResult(myLinkResult))
f.close()
Result-Expand.flv
We want to help your company grow. Our Result offices around the world can help you expand your business faster and more cost efficiently. And at the same time bring the experience of having expanded more than 150 companies during the past ten years.
Result can help you grow in your local market, regionally, or globally through our team of experienced business builders, our industry know-how and our know-who.
Our services range from well designed expansion strategies to assuming operational responsibility for turning these strategies into successful business.
We don’t see ourselves as mere consultants who give you a strategy presentation and then leave you to your own devices. We prefer to be considered as an extended, entirely practical arm of your management team. We’re hands-on and heads-on. We’re business builders.
We’re co-entrepreneurs. This is also reflected in our compensation structure – a significant part of our compensation is result  based.
As far as html output is concerned, you probably want to wrap each line in a <p>
paragraph tag.
output = BottomPipeResult.scrapeResult(myLinkResult)
f.write('\n'.join(['<p>' + x + '</p>' for x in output.split('\n')]))
Upvotes: 0