nexus404
nexus404

Reputation: 31

Javascript regex error possibly needs look-back type functionality?

I'm trying to complete a exercise on a Javascript learning website.

The instructions were:

My effort was this:

function getHashtags(post) {
  return /#(\w+)/.exec(post)
}

but it's resulting in this:

String Input: Hello #world
Outpu t: [ '#world', 'world', index: 6, input: 'Hello #world' ]

String Input: #lol #sorryNotSorry #heya #coolbeans
Output: [ '#lol','lol', index: 0, input: '#lol #sorryNotSorry #heya #coolbeans']

String Input: # # # #
Output: null

String Input: this is an in#line hash
Output: [ '#line', 'line', index: 13, input: 'this is an in#line hash' ]

String Input: too ##many tags
Output: [ '#many', 'many', index: 5, input: 'too ##many tags' ]

String Input: invalid chars #$? #;wha
Output: null

String Input: "" //empty string
null

String Input: #blue#red#yellow#green
Output:[ '#blue', 'blue', index: 0, input: '#blue#red#yellow#green' ]

I think I need lookback functionality but I know Javascript doesn't support it and I haven't been able to find a workaround! Can anyone help?

Upvotes: 3

Views: 77

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

Hashtags cannot be within the middle of a word (e.g. "in#line hashtag" returns an empty array)

-- Use non-word boundary \B that does not allow a word character to appear right before the #. Also, to exclude the match when the # is in the middle of the hashtag, add a word boundary that is not followed by #: (?!#)\b.

Hashtags must precede alphabetical characters (e.g. "#120398" or "#?" are invalid)

-- Use a [a-zA-Z] right after the # and then you can use \w. Or [a-z] if you plan to add /i modifier.

So, use

/\B#+([a-z]\w*(?!#)\b)/gi

See demo

This will cover basic Latin-script based hashtag extraction.

function getHashtags(post) {
  var re = /\B#+([a-z]\w*(?!#)\b)/gi;
  arr = []; 
  while ((m = re.exec(post)) !== null) {
    arr.push(m[1]);
    document.write("Hashtag: " + m[0] + ", name: " + m[1] + "<br/>");
  }
  return arr;
}


var strs = ['##alot', 'Hello #world', '#lol #sorryNotSorry #heya #coolbeans', '# # # #', 'this is an in#line hash', 'too ##many tags', 'invalid chars #$? #;wha', '', '#blue#red#yellow#green'];
strs.forEach(function (str) {
  console.log(getHashtags(str));
});

Upvotes: 2

d0nut
d0nut

Reputation: 2834

You're actually doing it (almost) correctly. When you use exec it only returns the first set of results. If you continue to call exec (assuming you're using the global flag g) it will begin to return the next matches. This example was taken from mozilla's site:

var myRe = /ab*/g;
var str = 'abbcdefabh';
var myArray;
while ((myArray = myRe.exec(str)) !== null) {
  var msg = 'Found ' + myArray[0] + '. ';
  msg += 'Next match starts at ' + myRe.lastIndex;
  console.log(msg);
}

Exec

Might I add that everyone can learn from how well this question was asked. Nice job showing what you've done to solve the problem. I'll even show you how you would implement this.

function getHashtags(post)
{
    regex = /#(\w+)/g;
    arr = [];

    while((results = regex.exec(post)) !== null)
    {
        arr.push(results[1]);
    }

    return arr;
}

Upvotes: 2

Related Questions