How to extract characters from strings using a common preceding pattern?

Question

I'm trying to use the sub function to isolate the Parcel Number from a messy string variable. The parcel numbers are identified within the string by a preceding "ParNum:" the characters around the desired number vary, but they follow the general form of these two examples.

string1 <- "Legal Description:  PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
string2 <- "Legal Description:  Rmrk:PT OF PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"

Miserable Failed effort:

> sub("[^:]*:)*[^:]*:", "", string1)
[1] "0511552031 ParNum:0511552031 CC:05 T:7 R:8"

Desired result:

0511552031

Tim Biegeleisen · Accepted Answer

Try using the following pattern with sub:

.*ParNum:([^[:blank:]]*).*

This matches ParNum:, and then captures any non space/tab characters which follows ParNum:. The captured number is then made available in the first capture group as \1.

Code snippet:

string1 <- "Legal Description:  PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
sub(".*ParNum:([^[:blank:]]*).*", "\1", string1)
[1] "0511552031"

How to extract characters from strings using a common preceding pattern?

Answers (2)

Demo

Related Questions