Reputation: 125
I have a dataframe in which I need to check if the content in each cell of a column follows a particular format.
Index Column A
0 abcd
1 abc_1
2 abc_xy
3 abc_12
4 zabc_12
How can I go about finding cells that match the format: 'abc_ + number' such that the values in index 1 and 3 would be found.
So far I know how I can look for the 'abc_' or numeric part of the cell using regex:
re.match('abc_', df['Column A'])
But I am not sure how to look for the complete pattern. Any help will be appreciated, thank you!
Upvotes: 2
Views: 2500
Reputation: 627110
You may use Series.str.contains
:
df['Column A'].str.contains(r'^abc_\d')
Or, if that pattern should match the whole string
df['Column A'].str.contains(r'^abc_\d+$')
Note that by default, the pat
argument is treated as a regex, so you do not have to use regex=True
. You may use a na
argument to define a fill value for missing values.
Pattern details
^
- start of string (you need it here as str.contains
uses re.search
, not re.match
and thus does not anchor the match at the start of the string)abc_
- a literal substring\d+
- 1+ digits$
- end of string.Upvotes: 5