akorn
akorn

Reputation: 70

How to extract characters from strings using a common preceding pattern?

I'm trying to use the sub function to isolate the Parcel Number from a messy string variable. The parcel numbers are identified within the string by a preceding "ParNum:" the characters around the desired number vary, but they follow the general form of these two examples.

string1 <- "Legal Description:  PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
string2 <- "Legal Description:  Rmrk:PT OF PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"

Miserable Failed effort:

> sub("[^:]*:)*[^:]*:", "", string1)
[1] "0511552031 ParNum:0511552031 CC:05 T:7 R:8"

Desired result:

0511552031

Upvotes: 2

Views: 43

Answers (2)

Chthonyx
Chthonyx

Reputation: 707

I find this easier to do with the stringr package from tidyverse. (In fact a question like this was what first prompted me to install stringr)

library(stringr)

string1 <- "Legal Description:  PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
string2 <- "Legal Description:  Rmrk:PT OF PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"

str_extract(string1, "(?<=ParNum:)[^[:blank:]]*")
# [1] "0511552031"

Also, str_extract and sub are vectorized so the following works

strings <- c(string1, string2)
str_extract(strings, "(?<=ParNum:)[^[:blank:]]*")
# [1] "0511552031" "0511552031"
sub(".*ParNum:([^[:blank:]]*).*", "\\1", strings)
# [1] "0511552031" "0511552031"

The pattern (?<=) is regex for lookbehind. This site has more information about lookarounds.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521599

Try using the following pattern with sub:

.*ParNum:([^[:blank:]]*).*

This matches ParNum:, and then captures any non space/tab characters which follows ParNum:. The captured number is then made available in the first capture group as \\1.

Code snippet:

string1 <- "Legal Description:  PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
sub(".*ParNum:([^[:blank:]]*).*", "\\1", string1)
[1] "0511552031"

Demo

Upvotes: 4

Related Questions