Karl Tryggvason
Karl Tryggvason

Reputation: 143

Javascript reg exp to insert links into lists

I have a bunch of tracklist content on my site that is in this format:

<div class="tracklist">
1. Artist - Title (Record Label)
2. Another artist - Title (Another label)
</div>

I want to use regular expressions to find the find the artist and label names and wrap them in links like so:

<div class="tracklist">
1. <a href="http://www.example.com/Artist">Artist</a> - Title <a href="http://www.example.com/Record+Label">(Record Label)</a>
2. <a href="http://www.example.com/Another+Artist">Another artist</a> - Title <a href="http://www.example.com/Another+label">(Another label)</a>  
</div>

I figured I can find the artist and label names with a JavaScript regex:

var artist = /[0-9]\. .*? -/gi
var label = /\(.*?\)/gi

use jQuery to find the matching strings:

$(".tracklist").html().match(label)
$(".tracklist").html().match(artist)

and then remove the number, period, spaces, dashes and parentheses with the substring() method. But what would be a good way to then insert the links and keep the text as well?

On a more general level, is this idea viable or would it fall under the "don't parse HTML with JavaScript"? Would a server side implementation be preferable (with some XML/XSL magic)?

Upvotes: 0

Views: 197

Answers (3)

Alan Moore
Alan Moore

Reputation: 75252

I don't see any point in switching to XSLT, because you'd still have to process the content of the DIV as text. For that kind of thing, jQuery/regex is about as good as it gets. You just aren't using regexes as effectively as you could be. Like @arnaud said, you should match and process one whole line at a time, using capturing groups to break out the interesting parts. Here's the regex I would use:

/^(\d+)\.\s*([^-]+?)\s*-\s*([^(]+?)\s*\((.*)\)/

match[1] is the track number,
match[2] is the artist,
match[3] is the title, and
match[4] is the label

I also arranged it so that none of the surrounding whitespace or other characters are captured--in fact, most of the whitespace is optional. In my experience, formatted data like this often contains inconsistencies in spacing; this makes it more likely the regex will match what you want it to, and it gives you the power to correct the inconsistencies. (Of course, it can also contain more serious flaws, but those usually have to be dealt with on a case-by-case basis.)

Upvotes: 0

Arnaud Le Blanc
Arnaud Le Blanc

Reputation: 99921

It doesn't falls under the "don't parse html with .." because you are not parsing HTML, you are parsing text and creating HTML from it.

You could get the whole text content of the div:

var text = $('.tracklist').text();

Then split into lines:

var lines = text.split(/\r?\n/);

And parse each line separately:

function parseLine(line) {
    var match = line.match(/^\d+\.\s+([^-]+)\s-\s([^(]+)(\s*(.*))/);
    if (match) {

        var artist = match[1], title = match[2], label = match[4];

        // create HTML here
    }       
}

$.each(lines, function(index, line) {
    var elems = parseLine(line);
    // append elems to the div
}

The regex can be explained as follows:

/^\d+\. # this matches the number followed by the dot at the begining
\s+     # the number is separated by one or more whitespace
([^-]+) # the artist: match everything except "-"
\s-\s   # matches the "-" separated by one or more whitespace
([^(]+) # the title: matches everything except "("
(\s+    # one or more whitespace
(.*))/  # the label

Upvotes: 1

Pez Cuckow
Pez Cuckow

Reputation: 14422

A server side implementation would definitely be better. Where are you pulling the data below from? Surely you have the information in an array or similar?

1. Artist - Title (Record Label)
2. Another artist - Title (Another label)

Also server side would depreciate nicely if the user didn't have javascript (almost negligible nowadays but it does happen!)

Upvotes: 1

Related Questions