Reputation: 629
I'm trying to parse the injected data of the torrents' list on movies.io (for example, here).
I need to parse the whole array of torrent and put it into an array of hash (it already have this structure into the injected code), to use it easily. But I can't seem to find how to do this. I can delete the "e; and & with gsub! but, that's all I got for now.
The data I recolt would look like this:
[
{id: 18210, sha1: 13BB6A6F65EA6203ACE218E830395AE61427EDBD, name: Star Wars Episode IV A New Hope 1977 1080p Bluray x264 anoXmous},
{id: 3701, sha1: D3F3C5C237299B2B9F4EC84B7F46F6E9E0424574, name: Star Wars Episode IV A New Hope 1977 720p BRRiP XViD AC3 - IMAGi}
]
Upvotes: 1
Views: 236
Reputation: 105
We also have a proper API endpoint for sources such as torrents, netflix, etc.
For example, http://movies.io/m/1R/sources.json
We're working on a real API with documentation, but it's not ready yet!
Upvotes: 4
Reputation: 146123
So what's happening is: the data-injected
attribute you are scraping is in fact just JSON, but it's encoded in HTML. After the browser parses it, it's in the DOM as ordinary JSON.
In fact, you can easily see how it's handled by looking at Scripts in the Chrome JavaScript Console and then clicking Pretty Print in order to keep your sanity. You will see it assign the attribute to f
and then later use it with f ? u($.parseJSON(f)) : ...
.
Since you are presumably using an HTML parser, I think you probably have the real original JSON there somewhere. In any case, some component in your system needs to stop substituting-away the HTML entities that originally supplied the quotes and then you can just feed the string to a JSON parser.
Upvotes: 1