marlon
marlon

Reputation: 7683

How to match the last space in this pattern and string?

s = "DL666 DL777 DL888 这波值不值你下载"

I want to match characters starting from the last space, so the match should be:

m = "这波值不值你下载"

I wrote this pattern, but it didn't work:

p = '\s.+?$'

I had thought the ? is for non-greedy search.

Upvotes: 1

Views: 1090

Answers (3)

pho
pho

Reputation: 25479

The reason it doesn't work is because it tries to find a match from left to right.

The regex \s.+?$ means:

  • Match one whitespace \s
  • Followed by any number +? of any character .
  • Until you reach the end of the line (or string) $

The $ after the ? makes the ? useless.

That's why it matches everything after the first space in your string. DL777 DL888 这波值不值你下载 Demo

On the other hand, if you changed the . to anything except whitespace (\S), you'd get what you want. While you're at it, you might as well remove the ?. And since you don't care about the whitespace before the non-whitespace, get rid of the \s as well. \S+$ matches 这波值不值你下载 Demo

Note that while this regex works, it's cheaper to just use str.rindex() and slice the string like GCG suggests in their answer to this question.

Upvotes: 4

Luis Guzman
Luis Guzman

Reputation: 1026

A simpler regex that you can also use as a search string in vim is:

\S\+$

It matches every non-whitespace until the end of line (or string if python). Note that you have to escape the + in vim.

Here is the python script:

import re
s = "DL666 DL777 DL888 这波值不值你下载"
m = re.search(r'\S+$', s)
print(m.group(0))

I tested it in python3:

$ python3
Python 3.6.8 (default, Mar  9 2021, 15:08:44) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44.0.3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> s = "DL666 DL777 DL888 这波值不值你下载"
>>> m = re.search(r'\S+$', s)
>>> print(m.group(0))
这波值不值你下载
>>>

Upvotes: 2

Green Cloak Guy
Green Cloak Guy

Reputation: 24691

If you insist on using regex for this, I'd just do a greedy search for "anything followed by whitespace" and then use a capture group to take everything after that.

import re

s = "DL666 DL777 DL888 这波值不值你下载"
m = re.match(r'^.*\s(.*)$', s).group(1)
# '这波值不值你下载'

An even more straightforward regex, if you're content with not starting at the beginning of the line, would be

m = re.search('[^\s]*$', s).group(0)
# '这波值不值你下载'

However, for something this simple, you might be better off just using str.rindex() to find the last occurrence of a space, and taking everything after it.:

m = s[s.rindex(' ') + 1:]
# '这波值不值你下载'

Upvotes: 1

Related Questions