MHA
MHA

Reputation: 69

Pandas str.extract() a number that ends in a letter

I have a pandas column like below:

      df['description']
0.   PRAIRIE HIGHLANDS SIXTH PLAT Lt: 156 PIN# DP73770000 0156 312 ABC
1.   PRAIRIE VILLAGE   PIN# OP55000034 0020A Rmrk: PT OF
2.   Sub: HOLLY GREEN Lt: 14 Bl: 1 PIN# DP34500001 0D14
3.   FAIRWAY   PIN# GP20000006 0029 Rmrk: W

I want to extract out PIN# such that expected results:

DP73770000 0156 312
OP55000034 0020A
DP34500001 0D14
GP20000006 0029

What I have tried:

df['PIN'] = df['description'].str.extract(r'\b(?:PIN# ?) (\w+(?:(?:\s*[ ])+\s*\d+)*)\b')

Results I got:

DP73770000 0156 312
OP55000034
DP34500001
GP20000006 0029

The ones that have letters in between the numbers after space are not getting picked up. I have tried a couple other regex as well to no success. How can I resolve this? Please help.

Upvotes: 1

Views: 84

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520948

I would use str.extract as follows:

df["PIN"] = df["description"].str.extract(r'PIN#((?: [A-Z0-9]*[0-9][A-Z0-9]*)*)')

Here is a link to a running regex demo showing that the logic is working.

Upvotes: 1

Related Questions