Reputation: 131
I am trying to find all rows in a pandas DataFrame for which the column col
takes values of the format 1234-XX-YYY
, where XX
is a placeholder for any two capital letters (A-Z) and YYY
is a placeholder for any three numbers [0-9].
Here is my code so far
How can I achieve the desired result?
df[df['col'].str.contains('^1234-\[A-Z]{2}\[d]{3}', na=False)]
Upvotes: 1
Views: 666
Reputation: 627082
When you escape an open [
you tell the regex engine to match it as a literal character. If you expect a -
to appear at some place in the string, you need to add it to the pattern. Also, if you expect uppercase letters to appear, you need A-Z
, not a-z
.
Use
^1234-[A-Z]{2}-[0-9]{3}$
Details
^
- start of string1234-
- a literal string[A-Z]{2}
- two uppercase letters-
- a hyphen[0-9]{3}
- three digits$
- end of string.Upvotes: 1