Reputation: 1099
I have a code
print(re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!"))
which results
['Holy', 'moly', 'feferoni', '']
How can I get rid of this last blank element, what caused it? If this is a dirty way to get rid of punctuation and spaces from a string, how else can I write but in regex?
Upvotes: 1
Views: 201
Reputation: 471
the first thing which comes to my mind is something like this:
>>> mystring = re.split(r"[\s\?\!\,\;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni', '']
>>> mystring.pop(len(mystring)-1)
>>> print mystring
['Holy', 'moly', 'feferoni']
Upvotes: 1
Reputation: 1473
You get the empty string as the last element of you list, because the RegEx splits after the last !
. It ends up giving you what's before the !
and what's after it, but after it, there's simply nothing, i.e. an empty string! You might have the same problem in the middle of the string if you didn't wisely add the +
to your RegEx.
Add a call to list
if you can't work with an iterator. If you want to elegantly get rid of the optional empty string, do:
filter(None, re.split(r"[\s?!,;]+", "Holy moly, feferoni!"))
This will result in:
['Holy', 'moly', 'feferoni']
What this does is remove every element that is not a True
value. The filter function generally only returns elements that satisfy a requirement given as a function, but if you pass None
it will check if the value itself is True
. Because an empty string is False
and every other string is True
it will remove every empty string from the list.
Also note I removed the escaping of special characters in the character class, as it is simply not neccessary and just makes the RegEx harder to read.
Upvotes: 1
Reputation:
Expanding on what @HamZa said in his comment, you would use re.findall
and a negative character set:
>>> from re import findall
>>> findall(r"[^\s?!,;]+", "Holy moly, feferoni!")
['Holy', 'moly', 'feferoni']
>>>
Upvotes: 2