RayZal
RayZal

Reputation: 63

regex match hastag, mention, url and special characters from data tweet

Example data from tweet:

I always meet @gEmbul at #kampus we always open the site https://www.youtube.com/ facebook# :) @007

the date is string, i want match mention with symbol @, hastag with symbol #, any url, and special caracter.

I will match hastag # in front of hastag and behind hastag

this my code

var data = "I always meet @gEmbul at #kampus we always open the site https://www.youtube.com/ facebook# :) @007"
function clean(data) {
	data = data.replace(/(?:https?|ftp):\/\/[\n\S]+/g, '')
      .replace(/\B\@\w\w+\b/g, '')
      .replace(/\B\#\w\w+\b/g, '');
	return data;
}
console.log(clean(data))

i will return

i always meet at we always open site

thanks.

Upvotes: 1

Views: 128

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627498

I sugges shrinking the pattern a bit (the 2 regexes you have differ in just 1 char and that can be done with a [#@] character class, and since you remove the matches, you may just combine the regexps with a | alternation operator):

var data = "I always meet @gEmbul at #kampus we always open the site https://www.youtube.com/ facebook# :) @007"
function clean(data) {
	data = data.replace(/(?:https?|ftp):\/\/[\n\S]+|\B[@#]\w+\b|\b\w+[@#]\B|\B[^\w\s]{2,}\B/g, '');
	return data;
}
document.body.innerHTML = clean(data);

Details:

  • (?:https?|ftp):\/\/[\n\S]+ - a regex that matches an URL that may span across newlines
  • | - or
  • \B[@#]\w+\b - a @ or # followed with 1+ word chars (as a whole word)
  • | - or
  • \b\w+[@#]\B - 1+ word chars followed with @ or # (as a whole word)
  • | - or
  • \B[^\w\s]{2,}\B - a non-word boundary, 2 or more chars other than word and whitespace, and again a non-word boundary. Remove \B to match 2 or more non-whitespace/non-word chars in any context.

Upvotes: 1

Related Questions