Xerath
Xerath

Reputation: 1099

Python - regex, blank element at the end of the list?

I have a code

print(re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!"))

which results

['Holy', 'moly', 'feferoni', '']

How can I get rid of this last blank element, what caused it? If this is a dirty way to get rid of punctuation and spaces from a string, how else can I write but in regex?

Upvotes: 1

Views: 201

Answers (4)

marsouf
marsouf

Reputation: 1147

__import__('re').findall('[^\s?!,;]+', 'Holy moly, feferoni!')

Upvotes: 0

Dirk
Dirk

Reputation: 471

the first thing which comes to my mind is something like this:

>>> mystring = re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni', '']
>>> mystring.pop(len(mystring)-1)

>>> print mystring
['Holy', 'moly', 'feferoni']

Upvotes: 1

Cu3PO42
Cu3PO42

Reputation: 1473

You get the empty string as the last element of you list, because the RegEx splits after the last !. It ends up giving you what's before the ! and what's after it, but after it, there's simply nothing, i.e. an empty string! You might have the same problem in the middle of the string if you didn't wisely add the + to your RegEx.

Add a call to list if you can't work with an iterator. If you want to elegantly get rid of the optional empty string, do:

filter(None, re.split(r"[\s?!,;]+", "Holy moly, feferoni!"))

This will result in:

['Holy', 'moly', 'feferoni']

What this does is remove every element that is not a True value. The filter function generally only returns elements that satisfy a requirement given as a function, but if you pass None it will check if the value itself is True. Because an empty string is False and every other string is True it will remove every empty string from the list.

Also note I removed the escaping of special characters in the character class, as it is simply not neccessary and just makes the RegEx harder to read.

Upvotes: 1

user2555451
user2555451

Reputation:

Expanding on what @HamZa said in his comment, you would use re.findall and a negative character set:

>>> from re import findall
>>> findall(r"[^\s?!,;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni']
>>>

Upvotes: 2

Related Questions