lala_12
lala_12

Reputation: 131

Find specific sequence of characters combining number and letters using regex in Python pandas

I am trying to find all rows in a pandas DataFrame for which the column col takes values of the format 1234-XX-YYY, where XX is a placeholder for any two capital letters (A-Z) and YYY is a placeholder for any three numbers [0-9].

Here is my code so far

How can I achieve the desired result?

df[df['col'].str.contains('^1234-\[A-Z]{2}\[d]{3}', na=False)]

Upvotes: 1

Views: 666

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

When you escape an open [ you tell the regex engine to match it as a literal character. If you expect a - to appear at some place in the string, you need to add it to the pattern. Also, if you expect uppercase letters to appear, you need A-Z, not a-z.

Use

^1234-[A-Z]{2}-[0-9]{3}$

Details

  • ^ - start of string
  • 1234- - a literal string
  • [A-Z]{2} - two uppercase letters
  • - - a hyphen
  • [0-9]{3} - three digits
  • $ - end of string.

Upvotes: 1

Related Questions