Reputation: 21

Regex count and print from a column

I am trying to count matching regex in a column and print out the amount found, the code below keeps giving me 0. I have a feeling it's not iterating through the whole column? My code is as below.

import re

pattern = ('/^[A-Z]{1}\d{8}$/i')
numbers = jan_df['Student Number']

iterator = re.finditer(pattern, str(numbers))
count = 0

for match in iterator:
    count+=1
print(count)

Upvotes: 2

Answers (1)

Wiktor Stribiżew

Reputation: 627101

You can use

df.loc[df['Student Number'].str.contains(r'^[A-Za-z]\d{8}$'), :].shape[0]

Or, if you plan to use a more specific regex and need to make it case insensitive:

df.loc[df['Student Number'].str.contains(r'^[A-Z]\d{8}$', case=False), :].shape[0]

# or

df.loc[df['Student Number'].str.contains(r'(?i)^[A-Z]\d{8}$'), :].shape[0]

Notes:

The regex in Python is defined with string literals, not regex literals, so you cannot use /.../i thing, you need ... with flags as options, or as inline flags ((?i)...)
{1} is always redundant in regex patterns, please remove it
Series.str.contains returns True or False depending if there is a match. df.loc[df[col].str.contains(...), :] only returns those rows where the match was found
Dataframe.shape returns the dimensions of the data frame, so .shape[0] returns the number of rows.

Related SO posts

Upvotes: 1

Regex count and print from a column

Answers (1)

Related Questions