Christina Zhou
Christina Zhou

Reputation: 1863

Count times a regex pattern appears in a list of strings

Say I have a list of schools:

schools = [
    '00A000',
    '01A000',
    '00B000',
    '01B000',
    '00C000',
    '01C000'
]

I'm doing some data exploration and the first thing I want to do is count all the schools like %A% (have an A in the middle).

I assumed I could use something like the command below:

schools.count('\BA')

But it looks like the only way I can that using a regex is with re module:

[re.findall('\BA', x) for x in schools].count(['A'])

Is that the easiest way to do it?

Full code:

import re

schools = [
    '00A000',
    '01A000',
    '00B000',
    '01B000',
    '00C000',
    '01C000'
]

# Data exploration. Find count of all district A schools.

# I thought I could use list's built in count and some kind of string regex for it to
# take in:
schools.count('\BA')
# Above example is invalid.

# It looks like I must loop over with regex and then add a count after, right?
[re.findall('\BA', x) for x in schools].count(['A'])

# Repeat for B and C...

Upvotes: 2

Views: 1463

Answers (3)

petre
petre

Reputation: 1543

You could ditch using regular expressions altogether, if indeed you want to match "xyAuv" but not "Axyuv" or "xyuvA", you could use:

len([1 for school in schools if 'A' in school[1:-1]])

If any 'A' in the string would do, of course just use 'A' in school.

A funnier way to write it is:

sum('A' in school for school in schools)

but it may be confusing and it is a bit slower.

Or:

from functools import reduce                                                                                 
from operator import add                                                                                     

reduce(add, ('A' in school for school in schools))                                                           

Which is funny but a bit faster.

Upvotes: 1

accdias
accdias

Reputation: 5372

As I said in my comment, I would go with:

len(re.findall('\BA\B', ','.join(schools)))

Here is a proof of concept:

Python 3.7.6 (default, Dec 19 2019, 22:52:49) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> schools = [
...     '00A000',
...     '01A000',
...     '00B000',
...     '01B000',
...     '00C000',
...     '01C000',
...     'A0D000',
...     '01B00A'
... ]
>>> 
>>> len(re.findall('\BA\B', ','.join(schools)))
2

Upvotes: 0

LeoE
LeoE

Reputation: 2083

How about joining the list to a string ans get the number of occurrences:

import re
print(len(re.findall(r'\BA',','.join(schools))))

Output:

2

Upvotes: 0

Related Questions