Reputation: 55
Good time of the day,
Currently I am little bit stuck on a challenge. I have to make a word count within a phrase, I have to split it by empty spaces or any special cases present.
import re
def word_count(string):
counts = dict()
regex = re.split(r" +|[\s+,._:+!&@$%^🖖]",string)
for word in regex:
word = str(word) if word.isdigit() else word
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
return counts
However I am stuck at Regex part. While splitting, empty space are taken also in account
I started with using
for word in string.split():
But it does not pass the test wiht phrases such as:
"car : carpet as java : javascript!!&@$%^&"
"hey,my_spacebar_is_broken."
'до🖖свидания!'
Hence, if I understand, RegEx is needed.
Thank you very much in advance!
Upvotes: 0
Views: 93
Reputation: 2219
Thanks to Olvin Roght for his suggestions. Your function can be elegantly reduced to this.
import re
from collections import Counter
def word_count(text):
count=Counter(re.split(r"[\W_]+",text))
del count['']
return count
See Ryszard Czech's answer for an equivalent one liner.
Upvotes: 2
Reputation: 18611
Use
import re
from collections import Counter
def word_count(text):
return Counter(re.findall(r"[^\W_]+",text))
[^\W_]+
matches one or more characters different from non-word and underscore chars. This matches one or more letters or digits in effect.
See regex proof.
Upvotes: 1
Reputation: 401
Change the regex pattern as below. No need to use ' +|
in the pattern as you are already using '\s'
. Also, note the '+'
.
regex = re.split(r"[\s+,._:+!&@$%^🖖]+", string)
Upvotes: 0