tenebris silentio
tenebris silentio

Reputation: 519

Python Regex - Summing the total of times strings from a list appear in a separate string

Regex and text data noob here.

I have a list of terms and I want to get a single sum of the total times the strings from my list appear in a separate string. In the example below, the letter "o" appears 3 times in my string and the letter "b" appears 2 times. I've created a variable called allcount which I know doesn't work, but ideally would have a total sum of 5.

Any help is appreciated.

import re
mylist = ['o', 'b']
my_string = 'Bob is cool'
onecount = len(re.findall('o', my_string)) #this works

#allcount = sum(len(re.findall(mylist, my_string))) #this doesn't work 

Upvotes: 0

Views: 45

Answers (3)

Gonçalo Peres
Gonçalo Peres

Reputation: 13582

There is no need to import additional libraries, nor to use a regular expression.

Considering that mylist and my_string look as follows

mylist = ['o', 'B']
my_string = 'Bob is cool'

Assuming the goal is to obtain a list with the count in the same order that the strings appear in the list mylist, one can do the following

newlist = [my_string.lower().count(x.lower()) for x in mylist]

[Out]: [3, 2]

If, on another hand, the goal is to obtain a dictionary where the keys are the strings from mylist and the values are the number of times they appear in my_string, one can do the following

newdict = {x.lower():my_string.lower().count(x.lower()) for x in mylist}

[Out]: {'b': 2, 'o': 3}

Notes:

  • The .lower() is to make it case insensitive.

Upvotes: 1

yatu
yatu

Reputation: 88236

Building a Counter and iterating over the list elements would be easier and more efficient:

from collections import Counter

c = Counter(my_string.lower())
# Counter({'b': 2, 'o': 3, ' ': 2, 'i': 1, 's': 1, 'c': 1, 'l': 1})
[c[s] for s in mylist]
# [3, 2]

Upvotes: 1

mlokos
mlokos

Reputation: 409

You have to find different patterns in your string.

This can be done by using pipe | sign

import re

mylist = ['o', 'b']
my_string = 'Bob is cool'
onecount = len(re.findall("o|b|B", my_string)) #this works

print(onecount)

Upvotes: 1

Related Questions