Reputation: 1820
Here is the html...
<iframe width="100%" height="166" scrolling="no" frameborder="no"
src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F11111111&auto_play=false
&show_artwork=true&color=c3000d&show_comments=false&liking=false
&download=false&show_user=false&show_playcount=false"></iframe>
I'm using NodeJS. I'm trying to extract the trackID, in this case 11111111
following tracks%2F
. What is the most stable method for performing this?
Should I use regex or some JS string method such as substring()
or match()
?
Upvotes: 1
Views: 3739
Reputation: 1821
Update for 2019...
This builds off of blueiur's answer and walks through a solution in more detail. JSDOM
needs to be installed before you can use it:
npm install jsdom
Now, according to the documentation, you can instantiate JSDOM
like this:
const jsdom = require('jsdom');
const { JSDOM } = jsdom;
You've already got some html you want to parse, I'll use your example and define it as a template literal:
const data = `<iframe width="100%" height="166" scrolling="no" frameborder="no"
src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F11111111&auto_play=false
&show_artwork=true&color=c3000d&show_comments=false&liking=false
&download=false&show_user=false&show_playcount=false"></iframe>`;
Here's the fun part... parse the html in NodeJS:
const { document } = (new JSDOM(data)).window;
What's happening here? You're creating a new JSDOM object with the provided HTML and grabbing the document
attribute of the window
attribute. From this point on, you can use document.getElementsByTagName()
and other similar functions just like you would in a browser.
To continue with your specific example, you want to extract the src
attribute of the only iframe
in the document. There are multiple ways to do that. One example is to use getElementsByTagName
to pull the first iframe
like this:
const src1 = document.getElementsByTagName('iframe')[0].src;
Now that we have the src
attribute, we can split it apart and process the url
query value. This is where we will use the URL
class which comes with NodeJS. According to the documentation, we can get the search parameters by creating a URL object and accessing the searchParams
attribute like this:
const params = (new URL(src1)).searchParams;
Now you've got the query string as a URLSearchParams
object and you can access individual terms like this:
const scURL = params.get('src');
If you look at the contents of scURL
now, you'll find it is the embedded url which was passed as a query, so we can parse that with another URL
object and extract the pathname
attribute like this:
const src2 = (new URL(src2)).pathname;
We're getting close now, and can split the path apart to the get value you wanted using JavaScript's standard string functions:
const val = src2.split('/')[2];
And print the result:
console.log(val);
... which produces this output:
11111111
To summarize, here is the complete code:
const jsdom = require('jsdom');
const { JSDOM } = jsdom;
const data = `<iframe width="100%" height="166" scrolling="no" frameborder="no"
src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F11111111&auto_play=false
&show_artwork=true&color=c3000d&show_comments=false&liking=false
&download=false&show_user=false&show_playcount=false"></iframe>`;
const { document } = (new JSDOM(data)).window;
const src1 = document.getElementsByTagName('iframe')[0].src;
const params = (new URL(src1)).searchParams;
const scURL = params.get('src');
const src2 = (new URL(src2)).pathname;
const val = src2.split('/')[2];
console.log(val);
Feel free to consolidate that and eliminate intermediate values as desired.
Upvotes: 2
Reputation: 44436
The Right™ way to to do this is to parse the HTML using some XML parser and get the URL that way and then use a reg-exp to parse the URL.
If for some reasons you don't have an infinite amount of time and energy, one of the proposed purely reg-exp solutions would work.
Upvotes: 0
Reputation: 653
If you know tracks%2F
is only going to show up once you could do:
var your_track_ID = src.split(/tracks%2F/)[1].split(/&/)[0];
There are probably better ways, but that should work fine for your purposes.
Upvotes: 2
Reputation: 1507
You can find tracks with node module [url + jsdom + qs]
Try this
var jsdom = require('jsdom');
var url = require('url');
var qs = require('qs');
var str = '<iframe width="100%" height="166" scrolling="no" frameborder="no"'
+ 'src="http://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Ftracks%2F11111111&auto_play=false"'
+ '&show_artwork=true&color=c3000d&show_comments=false&liking=false'
+ '&download=false&show_user=false&show_playcount=false"></iframe>';
jsdom.env({
html: str,
scripts: [
'http://code.jquery.com/jquery-1.5.min.js'
],
done: function(errors, window) {
var $ = window.$;
var src = $('iframe').attr('src');
var aRes = qs.parse(decodeURIComponent(url.parse(src).query)).url.split('/');
var track_id = aRes[aRes.length-1];
console.log("track_id =", track_id);
}
});
The result is:
track_id = 11111111
Upvotes: 1
Reputation: 35263
It's generally a terribly bad idea to parse HTML with a regular expression, but this might be forgivable. I'd look for the complete URL for safety:
var pattern = /w\.soundcloud\.com.*tracks%2F(\d+)&/
, trackID = (html.match(pattern) || [])[1]
Upvotes: 1
Reputation: 46
If the track id is always 8 digits and the html doesn't change you can do this:
var trackId = html.match(/\d{8}/)
Upvotes: 0