user19230076
user19230076

Reputation: 21

Regex count and print from a column

I am trying to count matching regex in a column and print out the amount found, the code below keeps giving me 0. I have a feeling it's not iterating through the whole column? My code is as below.

import re

pattern = ('/^[A-Z]{1}\d{8}$/i')
numbers = jan_df['Student Number']

iterator = re.finditer(pattern, str(numbers))
count = 0

for match in iterator:
    count+=1
print(count)

Upvotes: 2

Views: 895

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627101

You can use

df.loc[df['Student Number'].str.contains(r'^[A-Za-z]\d{8}$'), :].shape[0]

Or, if you plan to use a more specific regex and need to make it case insensitive:

df.loc[df['Student Number'].str.contains(r'^[A-Z]\d{8}$', case=False), :].shape[0]

# or

df.loc[df['Student Number'].str.contains(r'(?i)^[A-Z]\d{8}$'), :].shape[0]

Notes:

  • The regex in Python is defined with string literals, not regex literals, so you cannot use /.../i thing, you need ... with flags as options, or as inline flags ((?i)...)
  • {1} is always redundant in regex patterns, please remove it
  • Series.str.contains returns True or False depending if there is a match. df.loc[df[col].str.contains(...), :] only returns those rows where the match was found
  • Dataframe.shape returns the dimensions of the data frame, so .shape[0] returns the number of rows.

Related SO posts

Upvotes: 1

Related Questions