Reputation: 719
I’m struggling to find the last word on a line. The word might include symbols like !@#$%^&*[] etc. This needs to work for unicode character sets.
The regex needs to return two groups (both ignoring any white space at the end of the line)
This is what I have tried so far (.*\b(\w+))\W*$
but it's not working with symbols in the word.
'this test' => 'this test' and 'test'
' this test ' => 'this test' and 'test'
'this test$' => 'this test$' and 'test$'
'this# test$ ' => 'this# test$' and 'test$'
Upvotes: 1
Views: 441
Reputation: 163217
It might be a bit of a broad match, but you might use 2 capturing groups using \S
which will match a non whitespace char. You could update that to match exactly what you want to match to make it more specific using for example unicode categories.
For example, you might use a character class [\p{L}\p{N}_!@#$%^&*[\]]
to match any kind of letter / numeric character using \p{L}
and \p{N}
followed by the special chars that you would allow.
In the first group capture also the second group including the whitespaces in between to get the full match without the ending whitespaces.
In the second group capturing the last word.
(\S+\s+(\S+))\s*$
Explanation
(
Capturing group 1
\S+\s+
Match 1+ Non whitespace chars, match 1+ whitespace chars(\S+)
Capturing group 1, match 1+ non whitespace chars)
Close capturing group$
End of stringRegex demo with \S
| .NET Demo with special characters
Upvotes: 0
Reputation: 999
This unicode regex will do what you want:
(\p{L}+\P{L}?\p{Zs}+(\p{L}+\P{L}?))(?<!\p{Zs})
Regex details:
\p{L}+
matches one or multiple unicode characters in the category "letter".\P{L}?
matches one optional unicode character not belonging to the category "letter".\p{Zs}+
matches one or multiple spaces.(?<!\p{Zs})
negative lookbehind that prevents matching a space at the end of the string.Upvotes: 0
Reputation: 3653
Assuming you have the RegexOptions.Multiline
option on:
(?<=\s)([^\s][\S]{0,})(?=[\s]*?$)
Upvotes: 0
Reputation: 520968
We may try just splitting the input string on space, then taking the last entry, for a non regex option:
string input = "this# test$";
string[] parts = input.Split(null);
string last = parts[parts.Length - 1];
Console.WriteLine(last)
This prints:
test$
If you want a regex approach, then try matching on the following pattern:
\S+$
This will capture all contiguous non whitespace characters which appear right before the end of the input.
Upvotes: 1