Reputation: 775
I'm trying to extract a string pattern using python:
must start with capital letter 'C'
middle part can contain /
must ends with one or two digits
example strings:
193 skol C/12
334 skol C/6
577 skol C12
345 skol C6
expected matching results will be:
C/12
C/6
C12
C6
This is what my regular expression looks like:
df['a'].str.extract('^[C]\/?\d{1,2}$')
However, it doesn't generate the expected results.I tried to add "[ ]" in the regular expression, but it still doesn't work :( Can anyone please give me some suggestions? Thanks so much!
Upvotes: 1
Views: 9489
Reputation: 23200
import pandas as pd
a = pd.Series(['193 skol C/12','334 skol C/6','577 skol C12','345 skol C6'])
a.str.extract('(C\/?\d+)')
0 C/12 1 C/6 2 C12 3 C6
Why it works:
( Capturing group #1. Groups multiple tokens together and creates a capture group for extracting a substring or using a backreference.
C Character. Matches a "C" character (char code 67).
/ Escaped character. Matches a "/" character (char code 47).
? Optional. Match between 0 and 1 of the preceding token.
\d Digit. Matches any digit character (0-9).
+ Plus. Match 1 or more of the preceding token.
)
Upvotes: 0
Reputation: 1415
Try this:
\C(\/|)\d{1,2}$
\C
- Catch literal upper C
(/\|)
- Catch literal /
or none (pipe simbol with no match after)
\d{1,2}$
- Catch on or two numbers at end
Code:
df['a'].str.extract(r'\C(\/|)\d{1,2}$')
Upvotes: 2