Reputation: 11468
I have a text:
" Alice, Bob Charlie "
and I would like to get pairs of word (if any) and the whitespace after it. That is:
[("", " "), ("Alice,", " "), ("Bob", " "), ("Charlie", " ")]`
In Python, I tried:
re.findall(r"(\S*)(\s*)", " Alice, Bob Charlie ")
which almost works - it just adds an empty pair ("", "")
at the end. How to get rid of it? Except for .pop()? Also, I don't really understand why it is there at all - after it matches Charlie's whitespace it should finish, no?
Edit: to clarify - I want the first pair, i.e. no word with some whitespace. The last one - no word, no whitespace - is the one I want to get rid of. Without .pop(), possibly...
Upvotes: 1
Views: 2022
Reputation: 212845
re.findall(r"(\S+)(\s*)", " Alice, Bob Charlie ")
with a +
sign after the \S
returns what you probably want:
[('Alice,', ' '), ('Bob', ' '), ('Charlie', ' ')]
otherwise \S*\s*
can possibly match empty string at the end: zero-or-more and zero-or-more can equal to zero-length too.
Other possibility (apart from .pop()
) would be:
[a for a in re.findall(r"(\S*)(\s*)", " Alice, Bob Charlie ") if a != ('','')]
or:
re.findall(r"(\S*)(\s*)", " Alice, Bob Charlie ")[:-1]
both of which return exactly what you need (included the whitespace at the beginning):
[('', ' '), ('Alice,', ' '), ('Bob', ' '), ('Charlie', ' ')]
Upvotes: 2
Reputation: 940
Try changing \s*
to \s+
to require at least 1 character of whitespace:
>>> re.findall(r"(\S*)(\s+)", " Alice, Bob Charlie ")
[('', ' '), ('Alice,', ' '), ('Bob', ' '), ('Charlie', ' ')]
Upvotes: 2