Katsumi
Katsumi

Reputation: 33

Extract JSON string from html only using iOS API

I want to extract JSON string from html document "without" using third party Framework. I'm trying to create iOS framework and I do not want to use third party Framework in it.

Example url: http://www.nicovideo.jp/watch/sm33786214

In that html, there is a line:

I need to extract: JSON_String_I_want_to extract and convert it to JSON object.

With third party framework "Kanna", it is like this:



    if let doc = Kanna.HTML(html: html, encoding: String.Encoding.utf8) {
        if let descNode = doc.css("#js-initial-watch-data[data-api-data]").first {
            let dataApiData = descNode["data-api-data"]
                if let data = dataApiData?.data(using: .utf8) {
                    if let json = try? JSON(data: data, options: JSONSerialization.ReadingOptions.mutableContainers) {

I searched the web with similar question but unable to apply to my case:(I need to admit I'm not quite following regular expression)



      if let html = String(data:data, encoding:.utf8) {
        let pattern = "data-api-data=\"(.*?)\".*?>"
        let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
        let matches = regex.matches(in: html, options: [], range: NSMakeRange(0, html.count))
        var results: [String] = []
        matches.forEach { (match) -> () in
            results.append( (html as NSString).substring(with: match.rangeAt(1)) )
        }
        if let stringJSON = results.first {
          let d = stringJSON.data(using: String.Encoding.utf8)
          if let json = try? JSONSerialization.jsonObject(with: d!, options: []) as? Any {
            // it does not get here...      
          }

Anyone expert in extracting from html and convert it to JSON?

Thank you.

Upvotes: 1

Views: 1518

Answers (1)

OOPer
OOPer

Reputation: 47886

Your pattern does not seem to be bad, just that attribute values of HTML Elements may be using character entities.

You need to replace them into actual characters before parsing the String as JSON.

if let html = String(data:data, encoding: .utf8) {
    let pattern = "data-api-data=\"([^\"]*)\""
    let regex = try! NSRegularExpression(pattern: pattern, options: .caseInsensitive)
    let matches = regex.matches(in: html, range: NSRange(0..<html.utf16.count)) //<-USE html.utf16.count, NOT html.count
    var results: [String] = []
    matches.forEach {match in
        let propValue = html[Range(match.range(at: 1), in: html)!]
            //### You need to replace character entities into actual characters
            .replacingOccurrences(of: "&quot;", with: "\"")
            .replacingOccurrences(of: "&apos;", with: "'")
            .replacingOccurrences(of: "&gt;", with: ">")
            .replacingOccurrences(of: "&lt;", with: "<")
            .replacingOccurrences(of: "&amp;", with: "&")
        results.append(propValue)
    }
    if let stringJSON = results.first {
        let dataJSON = stringJSON.data(using: .utf8)!
        do {
            let json = try JSONSerialization.jsonObject(with: dataJSON)
            print(json)
        } catch {
            print(error) //You should not ignore errors silently...
        }
    } else {
        print("NO result")
    }
}

Upvotes: 1

Related Questions