BLAZORLOVER
BLAZORLOVER

Reputation: 2011

Regex to get everything starting with @ and removing everything after any non-included characters

I have the following:

        Regex RgxUrl = new Regex("[^a-zA-Z0-9-_]");
        foreach (var item in source.Split(' ').Where(s => s.StartsWith("@")))
        {
            var mention = item.Replace("@", "");
            mention = RgxUrl.Replace(mention, "");
            usernames.Add(mention);
        }

CURRENT INPUT > OUTPUT

DESIRED INPUT > OUTPUT

The key here is to remove anything that's after an offending character. How can this be achieved?

Upvotes: 1

Views: 51

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

You split a string with a space, check if a chunk starts with @, then if yes, remove all the @ symbols in the string, then use a regex to remove all non-alphanumeric, - and _ chars in the string and then add it to the list.

You can do that with a single regex:

var res = Regex.Matches(source, @"(?<!\S)@([a-zA-Z0-9-_]+)")
    .Cast<Match>()
    .Select(m=>m.Groups[1].Value)
    .ToList();
Console.WriteLine(string.Join("; ", res)); // demo
usernames.AddRange(res); // in your code

See the C# demo

Pattern details:

  • (?<!\S) - there must not be a non-whitespace symbol immediately to the left of the current location (i.e. there must be a whitespace or start of string) (this lookbehind is here because the original code split the string with whitespace)
  • @ - a @ symbol (it is not part of the subsequent group because this symbol was removed in the original code)
  • ([a-zA-Z0-9-_]+) - Capturing Group 1 (accessed with m.Groups[1].Value) matching one or more ASCII letters, digits, - and _ symbols.

Upvotes: 3

Related Questions