Reputation: 2730
I'm trying to match lines that contains chords, but I need to make sure each match is surrounded by whitespace or first in line without consuming the characters as I don't want them returned to the caller.
E.g.
Standard Tuning (Capo on fifth fret)
Time signature: 12/8
Tempo: 1.5 * Quarter note = 68 BPM
Intro: G Em7 G Em7
G Em7
I heard there was a secret chord
G Em7
That David played and it pleased the lord
C D G/B D
But you don't really care for music, do you?
G/B C D
Well it goes like this the fourth, the fifth
Em7 C
The minor fall and the major lift
D B7/D# Em
The baffled king composing hallelujah
Chorus:
G/A G/B C Em C G/B D/A G
Hal - le- lujah, hallelujah, hallelujah, hallelu-u-u-u-jah ....
Almost works except it also matches the "B" in "68 BPM". Now how do I make sure that chords are correctly matched? I don't want it to match the B in Before or the D or E in SUBSIDE?
This is my algorithm for matching on each separate line:
function getChordMatches(line) {
var pattern = /[ABCDEFG](?:#|##|b|bb)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[ABCDEFG](?:#|##|b|bb)?)?/g;
var chords = line.match(pattern);
var positions = [];
while ((match = pattern.exec(line)) != null) {
positions.push(match.index);
}
return {
"chords":chords,
"positions":positions
};
}
That is I want arrays on the form ["A", "Bm", "C#"] and not [" A", "Bm ", " C# "].
edit
I made it work using the accepted answer. I had to make some adjustments to accomodate the leading whitespaces. Thanks for taking the time everyone!
function getChordMatches(line) {
var pattern = /(?:^|\s)[A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:##?|bb?)?)?(?!\S)/g;
var chords = line.match(pattern);
var chordLength = -1;
var positions = [];
while ((match = pattern.exec(line)) != null) {
positions.push(match.index);
}
for (var i = 0; chords && i < chords.length; i++) {
chordLength = chords[i].length;
chords[i] = chords[i].trim();
positions[i] -= chords[i].length - chordLength;
}
return {
"chords":chords,
"positions":positions
};
}
Upvotes: 0
Views: 337
Reputation: 13631
Try the following
function getChordMatches( line ) {
var match,
pattern = /(?:^|\s)([A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?\d*(?:\/[A-G](?:##?|bb?)?)?)(?=$|\s)/g,
chords = [],
positions = [];
while ( match = pattern.exec(line) ) {
chords.push( match[1] );
positions.push( match.index );
}
return {
"chords" : chords,
"positions" : positions
};
}
It uses (?:^|\s)
to make sure the chord is either at the start of the line or is preceded by a space, and uses the positive look-ahead (?=$|\s)
to make sure the chord is followed by a space or is at the end of the line. Parentheses are added to capture the chord itself, which is then accessed by match[1]
.
Upvotes: 0
Reputation: 56809
I assume that you have split the input into lines already. And the function will process the lines one by one.
You just need to check that the line has a chord as the first item before extracting them:
if (/^\s*[A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:##?|bb?)?)?(?!\S)/.test(line)) {
// Match the chords here
}
I added ^\s*
in front to check from the beginning of the line, and added (?!\S)
to check that there is a whitespace character \s
or end of line after the first chord.
Note that I made some minor changes to your regex, since A##
(assuming it is valid chord) will not be matched by your current regex. The regex engine will check the match by following the order of the patterns in alternation, so #
will be attempted first in #|##
. It will find that A#
matches and return the match without checking for ##
. Either reversing the order ##|#
or use greedy quantifier ##?
fixes the problem, as it checks for the longer alternative first.
If you are sure that: "if the first item is a chord, then the rest are chords", then instead of matching, you can just split by spaces:
line.split(/\s+/);
Update
If you want to just match your pattern, regardless of whether the chord is inside a sentence (what you currently have will do that):
/(?:^|\s)[A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:##?|bb?)?)?(?!\S)/
This regex is to be placed in the code you have in your question.
I check that the chord is preceded by whitespace character or is the beginning of the line with (?:^|\s)
. You need to trim the leading space in the result, though.
Using \b
instead of (?:^|\s)
will avoid leading space issue, but the meaning is different. Unless you know the input well enough, I'd advice against it.
Another way is to split the string by \s+
, and test the following regex against each of the token (note the ^
at the beginning and $
at the end):
/^[A-G](?:##?|bb?)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:##?|bb?)?)?$/
Upvotes: 1
Reputation: 16033
In answer to the specific question in the title, use the look ahead :
(?=\s)
when embedded in an RE would ensure that the following character was a whitespace without consuming it.
Upvotes: 0
Reputation: 38102
Adding \b
(word boundary) to the start and end works for me. Also, you can use A-G
instead of ABCDEFG
. Thus:
> re = /\b[A-G](?:#|##|b|bb)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:#|##|b|bb)?)?\b/g
/\b[A-G](?:#|##|b|bb)?(?:min|m)?(?:maj|add|sus|aug|dim)?[0-9]*(?:\/[A-G](?:#|##|b|bb)?)?\b/g
> 'G/A G/B C Em C G/B D/A G'.match(re)
["G/A", "G/B", "C", "Em", "C", "G/B", "D/A", "G"]
> 'Tempo: 1.5 * Quarter note = 68 BPM'.match(re)
null
Upvotes: 0