How to extract substring with regex

Question

I have SKUs of like the following:

SBC225SLB32
SBA2161BRB30
PBA632AS32

Where the first 3-4 characters are A-Z, which must be extracted, and the following 3-4 numbers are [0-9], and also have to be extracted.

For the first, I tried \D{3,4} and for the second, I tried \d{3,4}.

But when using pandas' .str.extract('\D{3,4}'), I got a pattern contains no capture groups error. Is there a better way to do this?

Wiktor Stribiżew · Accepted Answer

In your case, it is more convenient to grab both values at once with the help of two capturing groups. You can use

df[['Code1', 'Code2']] = df['SKU'].str.extract(r'^([A-Z]{3,4})([0-9]{3,4})', expand=False)

See the regex demo. Pattern details:

Answers (1)