Mohammad Rahmani
Mohammad Rahmani

Reputation: 394

How to count only the words that I want?

I want to count only words of a dictionary.
For example :
There is a text :
Children can bye (paid) by credit card.
I want to count just paid.
But my code counts (paid).

import re, sys
d = {}
m = "children can bye (paid) by credit card."
n = m.split()
for i in n:
            d[i] = 0
    for j in n:
            d[j] = d[j] + 1

Is there any advice ?

Upvotes: 1

Views: 93

Answers (2)

zs2020
zs2020

Reputation: 54514

You can split the string with the following regex to split by nonword chars:

import re
n = re.split('\W+', m)

You can check the syntax here.

Upvotes: 2

Henry Keiter
Henry Keiter

Reputation: 17168

You just need to remove the punctuation from your individual tokens. Assuming you want to remove all the punctuation, take a look at the string module. Then (for example), you can go through each token and remove the punctuation. You can do this with one list comprehension:

words = [''.join(ch for ch in token if ch not in string.punctuation) 
         for token in m.split()]

All this code does is run through each character (ch) in each token (the results of m.split()). It allows all characters except it'll strip out any characters in string.punctuation. Of course if you want a different set of characters (say, maybe you want to allow apostrophes), you can just define that set of characters and use that instead.

Upvotes: 1

Related Questions