ASUCB
ASUCB

Reputation: 61

How to count occurences of a word following by a special character in a text using python regular expression

I want to count the number of occurrences of the word 'people' in a text using python. For that I use Counter and Python's regular expression:

    for j in range(len(paragraphs)):
        text = paragraphs[j].text
        count[j] = Counter(re.findall(r'\bpeople\b' ,text))

Yet, here my code does not take into account of the occurrences of people. people! people? How can I modify it to also count the cases when the word is followed by a specific character?

Thank you for you help,

Upvotes: 1

Views: 74

Answers (4)

Patrick Artner
Patrick Artner

Reputation: 51663

You can use an optional character-group in your regex:

r'\bpeople[.,!?]?\b'

The ? specifies it can occure 0 or 1 times - the [] specifies what characters are allowed. There is no need to escape the . (or f.e. ()*+?) inside [] although they have special meaning for regex. If you wanted to use a - inside [] you would need to escape it as it is used to denote ranges in sets [1-5] == 12345.

See: https://docs.python.org/3/library/re.html#regular-expression-syntax

[] Used to indicate a set of characters. In a set:

Characters can be listed individually, e.g. [amk] will match 'a', 'm', or 'k'. Ranges of characters can be indicated by giving two characters and separating them by a '-', for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. [...]

Upvotes: 2

Rick
Rick

Reputation: 45261

Does it have to use regex? Why not just:

len(text.split("people"))-1

Upvotes: 0

Clay Raynor
Clay Raynor

Reputation: 316

You can use a modifier statement at the end of the 'people' part of your Regex pattern. Try the following:

for j in range(len(paragraphs)):
    text = paragraphs[j].text
    count[j] = Counter(re.findall('r\bpeople[.?!]?\b', text)

The ? is for zero or more quantifier. The above pattern seems to work on regex101.com but I haven't tried in out in a Python shell yet.

Upvotes: 1

SPYBUG96
SPYBUG96

Reputation: 1117

people[?.!]

This will allow you to only match with people? people. and/or people!

So if you add a few more Counter(re.finall( you will be able to do something like this

#This will only match people
count[j] = Counter(re.findall(r'people\s' ,text))

#This will only match people?
count[j] = Counter(re.findall(r'people\?' ,text))

#This will only match people.
count[j] = Counter(re.findall(r'people\.' ,text))

#This will only match people!
count[j] = Counter(re.findall(r'people\!' ,text))

You need to use the \ to escape the special characters

Also this is a good resource when you are experimenting with python regular expressions: https://pythex.org/ The site also has a regular expression cheat sheet

Upvotes: 1

Related Questions