Reputation: 87
I have SKUs of like the following:
SBC225SLB32
SBA2161BRB30
PBA632AS32
Where the first 3-4 characters are A-Z, which must be extracted, and the following 3-4 numbers are [0-9], and also have to be extracted.
For the first, I tried \D{3,4}
and for the second, I tried \d{3,4}
.
But when using pandas' .str.extract('\D{3,4}')
, I got a pattern contains no capture groups
error.
Is there a better way to do this?
Upvotes: 1
Views: 105
Reputation: 626689
The regex pattern you pass to Series.str.extract
contains no capturing groups, while the method expects at least one.
In your case, it is more convenient to grab both values at once with the help of two capturing groups. You can use
df[['Code1', 'Code2']] = df['SKU'].str.extract(r'^([A-Z]{3,4})([0-9]{3,4})', expand=False)
See the regex demo. Pattern details:
^
- start of string([A-Z]{3,4})
- Capturing group 1: three to four uppercase ASCII letters([0-9]{3,4})
- Capturing group 2: three to four uppercase ASCII digits.Upvotes: 2