Michael T
Michael T

Reputation: 719

Regex to find last word (including symbols) on line

I’m struggling to find the last word on a line. The word might include symbols like !@#$%^&*[] etc. This needs to work for unicode character sets.

The regex needs to return two groups (both ignoring any white space at the end of the line)

This is what I have tried so far (.*\b(\w+))\W*$ but it's not working with symbols in the word.

'this test' => 'this test' and 'test'
' this test ' => 'this test' and 'test'
'this test$' => 'this test$' and 'test$'
'this# test$  ' => 'this# test$' and 'test$'

Upvotes: 1

Views: 441

Answers (4)

The fourth bird
The fourth bird

Reputation: 163217

It might be a bit of a broad match, but you might use 2 capturing groups using \S which will match a non whitespace char. You could update that to match exactly what you want to match to make it more specific using for example unicode categories.

For example, you might use a character class [\p{L}\p{N}_!@#$%^&*[\]] to match any kind of letter / numeric character using \p{L} and \p{N} followed by the special chars that you would allow.

In the first group capture also the second group including the whitespaces in between to get the full match without the ending whitespaces.

In the second group capturing the last word.

(\S+\s+(\S+))\s*$

Explanation

  • ( Capturing group 1
    • \S+\s+ Match 1+ Non whitespace chars, match 1+ whitespace chars
    • (\S+) Capturing group 1, match 1+ non whitespace chars
  • ) Close capturing group
  • $ End of string

Regex demo with \S | .NET Demo with special characters

Upvotes: 0

Junitar
Junitar

Reputation: 999

This unicode regex will do what you want:

(\p{L}+\P{L}?\p{Zs}+(\p{L}+\P{L}?))(?<!\p{Zs})

Regex details:

  • \p{L}+ matches one or multiple unicode characters in the category "letter".
  • \P{L}? matches one optional unicode character not belonging to the category "letter".
  • \p{Zs}+ matches one or multiple spaces.
  • (?<!\p{Zs}) negative lookbehind that prevents matching a space at the end of the string.

Demo

Upvotes: 0

Tyress
Tyress

Reputation: 3653

Assuming you have the RegexOptions.Multiline option on:

(?<=\s)([^\s][\S]{0,})(?=[\s]*?$)

Demo

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520968

We may try just splitting the input string on space, then taking the last entry, for a non regex option:

string input = "this# test$";
string[] parts = input.Split(null);
string last = parts[parts.Length - 1];
Console.WriteLine(last)

This prints:

test$

If you want a regex approach, then try matching on the following pattern:

\S+$

This will capture all contiguous non whitespace characters which appear right before the end of the input.

Upvotes: 1

Related Questions