John
John

Reputation: 8548

How to get a good web reader for iOS

Given a webpage, I would like to extract the text for a reader view. I am aware that SFSafariViewController offers a reader mode, but for my application, I need the actual text string. I am also aware of the Mercury parser, but I prefer a solution that runs locally.

I have tried many options:

luin/Readability looks very interesting. It seems to be a very active Github project. However, I could not make it work under iOS. What I tried/did:

I installed and used browserify to get a stand-alone JavaScript file. However, I got an error message Error: Mismatched anonymous define() module. I read that this problem may be solved by using derequire. I tried it but did not succeed.

Can anyone give me some advice on how to make luin/Readability work on iOS, possibly by using browserify or in any other way?

Upvotes: 0

Views: 470

Answers (1)

petard
petard

Reputation: 333

I had similar problem in my project that needed to render HTML from Readability as TextView. My initial approach was rendering using WKWebView by injecting slightly modified Mozilla Readability using evaluateJavaScript of WKWebView.

Mozilla Readability code was stored as local file and was modified by appending the following code:

// Execute Readbility on the currently loaded DOM

var uri = {
spec: location.href,
host: location.host,
prePath: location.protocol + "//" + location.host,
scheme: location.protocol.substr(0, location.protocol.indexOf(":")),
pathBase: location.protocol + "//" + location.host +  location.pathname.substr(0, location.pathname.lastIndexOf("/") + 1)
}; var documentClone = document.cloneNode(true); var article = new Readability(uri, documentClone).parse(); article;

The resulting content is then rendered using DTCoreText. WKWebView will load all resources of the webpage including all images, ads etc. This makes the approach very memory intensive, I tried circumventing this by parsing and removing images before passing it to WKWebView. Overall this works, but depending on your use case might not be very elegant or fast.

Currently I'm using a different approach, which involves running luis Readability on a server using phantomJS, which gives better results in terms of content extraction and is much less memory intensive on the client.

Upvotes: 3

Related Questions