Reputation: 550
I have a string from an input named 'text' and from this one, I would like to generate hashtags for a different field in my mongoose model:
req.body.tags = req.body.text
.split('#')
.map((tag) =>
tag.trim().replace(/ +/g, ' ').split(' ').join('-').toLowerCase()
)
.filter((tag) => tag.length !== 0)
The code above is almost perfect but every time, I press a comma it gets inserted as a hashtag(or part of it) which is something I'm trying to avoid as well, take a look into what I'm talking about:
{
"text": "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined"
}
The text above is the data I insert via Postman and this is the output:
"tags": [
"hola,-mi-nombre-es-kevin-y-tu-como-te-llamas?",
"random,",
"userundefined"
],
What I would like to get is this:
"tags": [
"random",
"userundefined"
],
I just want to retrieve the words followed by a #
and just that, I don't want the commas after it as shown in the random
tag
Upvotes: 1
Views: 864
Reputation: 33933
matchAll should be usefull here...
The demo below is based on the documentation example. It returns an array of match arrays. In your case, you want the match[1] of each match, therefore the chained map
.
let text = "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined, #user-defined, #Grand_Father, #test123Four, #99startWithNumberIsWrong, #911, #Special!characters?"
let validHashtags = [...text
.toLowerCase()
.matchAll(/#(([a-z_]+)([\w_]+)?)/g)]
.map(match => match[1])
console.log(validHashtags)
So that would be:
req.body.tags = [...req.body.text
.toLowerCase()
.matchAll(/#(([a-z_]+)([\w_]+)?)/g)]
.map(match => match[1])
I used a regular expression that complies with hashtag.org:
For the length and slangs, you simply should advise your users about it when they enter the text.
Upvotes: 1
Reputation: 163217
If you don't want to match digits only, or start with a digit you can use a capture group and use for example matchAll to get the group value.
\B#([^\W\d]\w*)\b
The pattern matches:
\B#
A non word boundary followed by matching #(
Capture group 1
[^\W\d]\w*
Match a word character not being a digit, then match optional word characters)
Close group 1\b
A word boundaryExample code:
const s = "Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined";
const regex = /\B#([^\W\d]\w*)\b/g;
console.log(Array.from(s.matchAll(regex), m => m[1]));
Upvotes: 0
Reputation: 4337
Well, you could play around with req.body.text
in a slightly different way, now that I see you want to filter out the tags
from full texts
working example
//you can declare this function outside of whatever web-server callback you have
function tagsFrom(text){ //function
var toReturn=[], i=0, hashtag=false
let isNumber=(n)=>Number(n).toString()==n //!isNaN(n) doesn't work properly
let isValidChar=(c)=>c=="_"?true:isNumber(c)||(c.toUpperCase()!=c.toLowerCase())
for(let c of text){
if(typeof toReturn[i]!="string"){toReturn[i]=""} //because I can't add a character to undefined
if(c=="#"){hashtag=true;continue} //hashtag found
if(isValidChar(c)&&hashtag){toReturn[i]+=c} //character of a hashtag word
else if(hashtag){hashtag=false;i++} //no longer the hashtag
}
return toReturn.filter(tag=>tag.length&&!isNumber(tag))
//no empty values and a tag can't be ONLY a number
}
var req={body:{text:"Hola, mi nombre es Kevin y tu como te llamas? #random, #userundefined #1 #spanish101 #, #1name ##underscore_test"}} //test variable
req.body.tags = tagsFrom(req.body.text) //usage
console.log(req.body.tags)
MY EDIT: I parse hashtags based on this article about hashtags
Upvotes: 0
Reputation: 52523
You might want to trim the commas off the ends of your tags after you split:
.map((tag) => {
let newTag = tag.trim().replace(/ +/g, ' ').split(' ').join('-').toLowerCase();
if (newTag.endsWith(',')) {
newTag = newTag.substring(0, newTag.length - 1);
}
return newTag;
})
EDIT: If you're trying to get rid of anything that isn't preceded by a hashtag, you need to do your split a little differently. I would recommend maybe looking for .indexOf('#')
and using .substring()
to remove anything up to that index, then do your split.
Upvotes: 0