Kirasiris
Kirasiris

Reputation: 550

Generate hashtags from string in JavaScript

I have a string from an input named 'text' and from this one, I would like to generate hashtags for a different field in my mongoose model:

req.body.tags = req.body.text
  .split('#')
  .map((tag) =>
    tag.trim().replace(/ +/g, ' ').split(' ').join('-').toLowerCase()
  )
  .filter((tag) => tag.length !== 0)

The code above is almost perfect but every time, I press a comma it gets inserted as a hashtag(or part of it) which is something I'm trying to avoid as well, take a look into what I'm talking about:

{
    "text": "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined"
}

The text above is the data I insert via Postman and this is the output:

"tags": [
  "hola,-mi-nombre-es-kevin-y-tu-como-te-llamas?",
  "random,",
  "userundefined"
],

What I would like to get is this:

"tags": [
  "random",
  "userundefined"
],

I just want to retrieve the words followed by a # and just that, I don't want the commas after it as shown in the random tag

Upvotes: 1

Views: 864

Answers (4)

Louys Patrice Bessette
Louys Patrice Bessette

Reputation: 33933

matchAll should be usefull here...

The demo below is based on the documentation example. It returns an array of match arrays. In your case, you want the match[1] of each match, therefore the chained map.

let text = "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined, #user-defined, #Grand_Father, #test123Four, #99startWithNumberIsWrong, #911, #Special!characters?"

let validHashtags = [...text
  .toLowerCase()
  .matchAll(/#(([a-z_]+)([\w_]+)?)/g)]
  .map(match => match[1])
  
console.log(validHashtags)

So that would be:

req.body.tags = [...req.body.text
  .toLowerCase()
  .matchAll(/#(([a-z_]+)([\w_]+)?)/g)]
  .map(match => match[1])

I used a regular expression that complies with hashtag.org:

  • No spaces
  • No Special Characters
  • Don't Start With or Use Only Numbers

For the length and slangs, you simply should advise your users about it when they enter the text.

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163217

If you don't want to match digits only, or start with a digit you can use a capture group and use for example matchAll to get the group value.

\B#([^\W\d]\w*)\b

The pattern matches:

  • \B# A non word boundary followed by matching #
  • ( Capture group 1
    • [^\W\d]\w* Match a word character not being a digit, then match optional word characters
  • ) Close group 1
  • \b A word boundary

Regex demo

Example code:

const s = "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined";
const regex = /\B#([^\W\d]\w*)\b/g;
console.log(Array.from(s.matchAll(regex), m => m[1]));

Upvotes: 0

The Bomb Squad
The Bomb Squad

Reputation: 4337

Well, you could play around with req.body.text in a slightly different way, now that I see you want to filter out the tags from full texts

working example

//you can declare this function outside of whatever web-server callback you have
function tagsFrom(text){ //function
  var toReturn=[], i=0, hashtag=false
  let isNumber=(n)=>Number(n).toString()==n //!isNaN(n) doesn't work properly
  let isValidChar=(c)=>c=="_"?true:isNumber(c)||(c.toUpperCase()!=c.toLowerCase())
  for(let c of text){
    if(typeof toReturn[i]!="string"){toReturn[i]=""} //because I can't add a character to undefined
    if(c=="#"){hashtag=true;continue} //hashtag found
    if(isValidChar(c)&&hashtag){toReturn[i]+=c} //character of a hashtag word
    else if(hashtag){hashtag=false;i++} //no longer the hashtag
  }
  return toReturn.filter(tag=>tag.length&&!isNumber(tag))
  //no empty values and a tag can't be ONLY a number
}

var req={body:{text:"Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined #1 #spanish101 #, #1name ##underscore_test"}} //test variable
req.body.tags = tagsFrom(req.body.text) //usage
console.log(req.body.tags)

MY EDIT: I parse hashtags based on this article about hashtags

Upvotes: 0

Jason
Jason

Reputation: 52523

You might want to trim the commas off the ends of your tags after you split:

.map((tag) => {
  let newTag = tag.trim().replace(/ +/g, ' ').split(' ').join('-').toLowerCase();
  if (newTag.endsWith(',')) {
    newTag = newTag.substring(0, newTag.length - 1);
  }
  return newTag;
})

EDIT: If you're trying to get rid of anything that isn't preceded by a hashtag, you need to do your split a little differently. I would recommend maybe looking for .indexOf('#') and using .substring() to remove anything up to that index, then do your split.

Upvotes: 0

Related Questions