Reputation: 394
I want to count only words of a dictionary.
For example :
There is a text :
Children can bye (paid) by credit card.
I want to count just paid.
But my code counts (paid).
import re, sys
d = {}
m = "children can bye (paid) by credit card."
n = m.split()
for i in n:
d[i] = 0
for j in n:
d[j] = d[j] + 1
Is there any advice ?
Upvotes: 1
Views: 93
Reputation: 54514
You can split the string with the following regex to split by nonword chars:
import re
n = re.split('\W+', m)
You can check the syntax here.
Upvotes: 2
Reputation: 17168
You just need to remove the punctuation from your individual tokens. Assuming you want to remove all the punctuation, take a look at the string
module. Then (for example), you can go through each token and remove the punctuation. You can do this with one list comprehension:
words = [''.join(ch for ch in token if ch not in string.punctuation)
for token in m.split()]
All this code does is run through each character (ch
) in each token (the results of m.split()
). It allows all characters except it'll strip out any characters in string.punctuation
. Of course if you want a different set of characters (say, maybe you want to allow apostrophes), you can just define that set of characters and use that instead.
Upvotes: 1