Ecir Hana
Ecir Hana

Reputation: 11468

Regex to match word and trailing whitespace pairs

I have a text:

"    Alice, Bob    Charlie  "

and I would like to get pairs of word (if any) and the whitespace after it. That is:

[("", "    "), ("Alice,", " "), ("Bob", "    "), ("Charlie", "  ")]`

In Python, I tried:

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")

which almost works - it just adds an empty pair ("", "") at the end. How to get rid of it? Except for .pop()? Also, I don't really understand why it is there at all - after it matches Charlie's whitespace it should finish, no?

Edit: to clarify - I want the first pair, i.e. no word with some whitespace. The last one - no word, no whitespace - is the one I want to get rid of. Without .pop(), possibly...

Upvotes: 1

Views: 2022

Answers (3)

georg
georg

Reputation: 214949

I think this would do that

re.findall('(\S+|^)(\s*)', s)

Upvotes: 2

eumiro
eumiro

Reputation: 212845

re.findall(r"(\S+)(\s*)", "    Alice, Bob    Charlie  ")

with a + sign after the \S returns what you probably want:

[('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

otherwise \S*\s* can possibly match empty string at the end: zero-or-more and zero-or-more can equal to zero-length too.

Other possibility (apart from .pop()) would be:

[a for a in re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ") if a != ('','')]

or:

re.findall(r"(\S*)(\s*)", "    Alice, Bob    Charlie  ")[:-1]

both of which return exactly what you need (included the whitespace at the beginning):

[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

Upvotes: 2

Wil Cooley
Wil Cooley

Reputation: 940

Try changing \s* to \s+ to require at least 1 character of whitespace:

>>> re.findall(r"(\S*)(\s+)", "    Alice, Bob    Charlie  ")
[('', '    '), ('Alice,', ' '), ('Bob', '    '), ('Charlie', '  ')]

Upvotes: 2

Related Questions