Elsa Li
Elsa Li

Reputation: 775

Regular expression must start with a letter and ends with digits in python

I'm trying to extract a string pattern using python:

  1. must start with capital letter 'C'

  2. middle part can contain /

  3. must ends with one or two digits

example strings:

193 skol C/12
334 skol C/6
577 skol C12
345 skol C6

expected matching results will be:

C/12
C/6
C12
C6

This is what my regular expression looks like:

df['a'].str.extract('^[C]\/?\d{1,2}$') 

However, it doesn't generate the expected results.I tried to add "[ ]" in the regular expression, but it still doesn't work :( Can anyone please give me some suggestions? Thanks so much!

Upvotes: 1

Views: 9489

Answers (3)

Hack-R
Hack-R

Reputation: 23200

import pandas as pd    
a = pd.Series(['193 skol C/12','334 skol C/6','577 skol C12','345 skol C6'])

a.str.extract('(C\/?\d+)')
0    C/12
1     C/6
2     C12
3      C6

Why it works:

( Capturing group #1. Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference.

C Character. Matches a "C" character (char code 67).

/ Escaped character. Matches a "/" character (char code 47).

? Optional. Match between 0 and 1 of the preceding token.

\d Digit. Matches any digit character (0-9).

+ Plus. Match 1 or more of the preceding token.

)

Upvotes: 0

marvel308
marvel308

Reputation: 10458

You can use the regex

C\/?\d{1,2}

see the regex demo

Upvotes: 1

Abe
Abe

Reputation: 1415

Try this:

\C(\/|)\d{1,2}$

\C - Catch literal upper C
(/\|) - Catch literal / or none (pipe simbol with no match after)
\d{1,2}$ - Catch on or two numbers at end

Code:

df['a'].str.extract(r'\C(\/|)\d{1,2}$') 

Upvotes: 2

Related Questions