Raja Uzair Saeed
Raja Uzair Saeed

Reputation: 21

How to make re.findall case insensitve?

I'm trying to find specific words in a string using re.findall method, and so far I have successfully implemented the code but the issue is that my code is case sensitive and I'm trying case insensitive. I have tried setting flag re.IGNORECASE etc but no luck, kindly help me out. Here is my code:

from collections import Counter
vocab = ['Chrome', 'Mozilla', 'Opera', 'iPhone', 'Spider']
with open('Assignment_log.txt', 'r') as file:
    data = file.read().replace('\n', '')
wordcount = dict((x,0) for x in vocab)

for w in re.findall(r"\w+", data, re.IGNORECASE):
    if w in wordcount:
        wordcount[w] += 1

wordcount = Counter(wordcount)
print(wordcount)

Output: Counter({'Mozilla': 339, 'Chrome': 35, 'Opera': 16, 'iPhone': 2, 'Spider': 0})

here 'spider' is case sensitive so, I got zero counts.

Upvotes: 2

Views: 2048

Answers (2)

PieCot
PieCot

Reputation: 3639

I think that your problem is when you check if w in wordcount:: clearly if the word is different from the key you have used to build the dictionary, that count will be skipped. You can, on the other hand, look for all the occurrences of the items in vocab:

wordcount = {
    w: len(re.findall(r"(^|\s+){}(\s+|$)".format(w), data, re.IGNORECASE))
    for w in vocab
}

You can customize the regular expression to satisfy the conditions you want to meet.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521239

Try lowercasing both your vocabulary list and each match you get from the call to re.findall:

vocab = ['chrome', 'mozilla', 'opera', 'iphone', 'spider']
with open('Assignment_log.txt', 'r') as file:
    data = file.read().replace('\n', '')
wordcount = dict((x,0) for x in vocab)

for w in re.findall(r"\w+", data):
    if lower(w) in wordcount:
        wordcount[w] += 1

wordcount = Counter(wordcount)
print(wordcount)

Note that since you are just searching for \w+ in your call re.findall, I don't really see the point of using a lowercase flag there. Just lowercase each word you find before comparing against the vocabulary list.

Upvotes: 1

Related Questions