Reputation: 2011
I have the following:
Regex RgxUrl = new Regex("[^a-zA-Z0-9-_]");
foreach (var item in source.Split(' ').Where(s => s.StartsWith("@")))
{
var mention = item.Replace("@", "");
mention = RgxUrl.Replace(mention, "");
usernames.Add(mention);
}
CURRENT INPUT > OUTPUT
@fish and fries are @good
> fish
, good
@fish and fries and @Mary's beer are @good
> fish
, good
, marys
DESIRED INPUT > OUTPUT
@fish and fries are @good
> fish
, good
@fish and fries and @Mary's beer are @good
> fish
, good
, Mary
The key here is to remove anything that's after an offending character. How can this be achieved?
Upvotes: 1
Views: 51
Reputation: 626870
You split a string with a space, check if a chunk starts with @
, then if yes, remove all the @
symbols in the string, then use a regex to remove all non-alphanumeric, -
and _
chars in the string and then add it to the list.
You can do that with a single regex:
var res = Regex.Matches(source, @"(?<!\S)@([a-zA-Z0-9-_]+)")
.Cast<Match>()
.Select(m=>m.Groups[1].Value)
.ToList();
Console.WriteLine(string.Join("; ", res)); // demo
usernames.AddRange(res); // in your code
See the C# demo
Pattern details:
(?<!\S)
- there must not be a non-whitespace symbol immediately to the left of the current location (i.e. there must be a whitespace or start of string) (this lookbehind is here because the original code split the string with whitespace)@
- a @
symbol (it is not part of the subsequent group because this symbol was removed in the original code)([a-zA-Z0-9-_]+)
- Capturing Group 1 (accessed with m.Groups[1].Value
) matching one or more ASCII letters, digits, -
and _
symbols.Upvotes: 3