Reputation: 21
I am trying to count matching regex
in a column and print out the amount found, the code below keeps giving me 0. I have a feeling it's not iterating through the whole column? My code is as below.
import re
pattern = ('/^[A-Z]{1}\d{8}$/i')
numbers = jan_df['Student Number']
iterator = re.finditer(pattern, str(numbers))
count = 0
for match in iterator:
count+=1
print(count)
Upvotes: 2
Views: 895
Reputation: 627101
You can use
df.loc[df['Student Number'].str.contains(r'^[A-Za-z]\d{8}$'), :].shape[0]
Or, if you plan to use a more specific regex and need to make it case insensitive:
df.loc[df['Student Number'].str.contains(r'^[A-Z]\d{8}$', case=False), :].shape[0]
# or
df.loc[df['Student Number'].str.contains(r'(?i)^[A-Z]\d{8}$'), :].shape[0]
Notes:
/.../i
thing, you need ...
with flags as options, or as inline flags ((?i)...
){1}
is always redundant in regex patterns, please remove itSeries.str.contains
returns True or False depending if there is a match. df.loc[df[col].str.contains(...), :]
only returns those rows where the match was foundDataframe.shape
returns the dimensions of the data frame, so .shape[0]
returns the number of rows.Related SO posts
Upvotes: 1