Reputation: 309
Here's what I'm working with…
string1 = "Dog,cat,mouse,bird. Human."
def string_count(text):
text = re.split('\W+', text)
count = 0
for x in text:
count += 1
print count
print x
return text
print string_count(string1)
…and here's the output…
1
Dog
2
cat
3
mouse
4
bird
5
Human
6
['Dog', 'cat', 'mouse', 'bird', 'Human', '']
Why am I getting a 6 even though there are only 5 words? I can't seem to get rid of the ''
(empty string)! It's driving me insane.
Upvotes: 1
Views: 88
Reputation: 54173
Avinash Raj correctly stated WHY it's doing that. Here's how to fix it:
string1 = "Dog,cat,mouse,bird. Human."
the_list = [word for word in re.split('\W+', string1) if word]
# include the word in the list if it's not the empty string
Or alternatively (and this is better...)
string1 = "Dog,cat,mouse,bird. Human."
the_list = re.findall('\w+', string1)
# find all words in string1
Upvotes: 1
Reputation: 174696
Because while it splits based on the last dot, it gives the last empty part also.
You splitted the input string based on \W+
which means split the input string based on one or more non-word character. So your regex matches the last dot also and splits the input based on the last dot also. Because of no string present after to the last dot, it returns an empty string after splitting.
Upvotes: 1