Reputation: 70
I'm trying to use the sub function to isolate the Parcel Number from a messy string variable. The parcel numbers are identified within the string by a preceding "ParNum:" the characters around the desired number vary, but they follow the general form of these two examples.
string1 <- "Legal Description: PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
string2 <- "Legal Description: Rmrk:PT OF PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
Miserable Failed effort:
> sub("[^:]*:)*[^:]*:", "", string1)
[1] "0511552031 ParNum:0511552031 CC:05 T:7 R:8"
Desired result:
0511552031
Upvotes: 2
Views: 43
Reputation: 707
I find this easier to do with the stringr
package from tidyverse
. (In fact a question like this was what first prompted me to install stringr
)
library(stringr)
string1 <- "Legal Description: PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
string2 <- "Legal Description: Rmrk:PT OF PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
str_extract(string1, "(?<=ParNum:)[^[:blank:]]*")
# [1] "0511552031"
Also, str_extract
and sub
are vectorized so the following works
strings <- c(string1, string2)
str_extract(strings, "(?<=ParNum:)[^[:blank:]]*")
# [1] "0511552031" "0511552031"
sub(".*ParNum:([^[:blank:]]*).*", "\\1", strings)
# [1] "0511552031" "0511552031"
The pattern (?<=)
is regex for lookbehind. This site has more information about lookarounds.
Upvotes: 1
Reputation: 521599
Try using the following pattern with sub
:
.*ParNum:([^[:blank:]]*).*
This matches ParNum:
, and then captures any non space/tab characters which follows ParNum:
. The captured number is then made available in the first capture group as \\1
.
Code snippet:
string1 <- "Legal Description: PrpId:0511552031 ParNum:0511552031 CC:05 T:7 R:8"
sub(".*ParNum:([^[:blank:]]*).*", "\\1", string1)
[1] "0511552031"
Upvotes: 4