Reputation: 31
I'm trying to complete a exercise on a Javascript learning website.
The instructions were:
Input: String of words, where some words may contain a hashtag/pound sign #.
Output: Array of strings that were prefixed with the hashtag/pound sign #, but do not contain the hashtag/pound sign #.
Pound signs alone do not count, for example: the string "#" would return an empty array.
My effort was this:
function getHashtags(post) {
return /#(\w+)/.exec(post)
}
but it's resulting in this:
String Input: Hello #world
Outpu t: [ '#world', 'world', index: 6, input: 'Hello #world' ]
String Input: #lol #sorryNotSorry #heya #coolbeans
Output: [ '#lol','lol', index: 0, input: '#lol #sorryNotSorry #heya #coolbeans']
String Input: # # # #
Output: null
String Input: this is an in#line hash
Output: [ '#line', 'line', index: 13, input: 'this is an in#line hash' ]
String Input: too ##many tags
Output: [ '#many', 'many', index: 5, input: 'too ##many tags' ]
String Input: invalid chars #$? #;wha
Output: null
String Input: "" //empty string
null
String Input: #blue#red#yellow#green
Output:[ '#blue', 'blue', index: 0, input: '#blue#red#yellow#green' ]
I think I need lookback functionality but I know Javascript doesn't support it and I haven't been able to find a workaround! Can anyone help?
Upvotes: 3
Views: 77
Reputation: 627103
Hashtags cannot be within the middle of a word (e.g. "in#line hashtag" returns an empty array)
-- Use non-word boundary \B
that does not allow a word character to appear right before the #
. Also, to exclude the match when the #
is in the middle of the hashtag, add a word boundary that is not followed by #
: (?!#)\b
.
Hashtags must precede alphabetical characters (e.g. "#120398" or "#?" are invalid)
-- Use a [a-zA-Z]
right after the #
and then you can use \w
. Or [a-z]
if you plan to add /i
modifier.
So, use
/\B#+([a-z]\w*(?!#)\b)/gi
See demo
This will cover basic Latin-script based hashtag extraction.
function getHashtags(post) {
var re = /\B#+([a-z]\w*(?!#)\b)/gi;
arr = [];
while ((m = re.exec(post)) !== null) {
arr.push(m[1]);
document.write("Hashtag: " + m[0] + ", name: " + m[1] + "<br/>");
}
return arr;
}
var strs = ['##alot', 'Hello #world', '#lol #sorryNotSorry #heya #coolbeans', '# # # #', 'this is an in#line hash', 'too ##many tags', 'invalid chars #$? #;wha', '', '#blue#red#yellow#green'];
strs.forEach(function (str) {
console.log(getHashtags(str));
});
Upvotes: 2
Reputation: 2834
You're actually doing it (almost) correctly. When you use exec
it only returns the first set of results. If you continue to call exec
(assuming you're using the global flag g
) it will begin to return the next matches. This example was taken from mozilla's site:
var myRe = /ab*/g;
var str = 'abbcdefabh';
var myArray;
while ((myArray = myRe.exec(str)) !== null) {
var msg = 'Found ' + myArray[0] + '. ';
msg += 'Next match starts at ' + myRe.lastIndex;
console.log(msg);
}
Might I add that everyone can learn from how well this question was asked. Nice job showing what you've done to solve the problem. I'll even show you how you would implement this.
function getHashtags(post)
{
regex = /#(\w+)/g;
arr = [];
while((results = regex.exec(post)) !== null)
{
arr.push(results[1]);
}
return arr;
}
Upvotes: 2