Reputation: 497
I want to get pattern from my vector of strings
string <- c(
"P10000101 - Przychody netto ze sprzedazy produktów" ,
"P10000102_PL - Przychody nettozy uslug",
"P1000010201_PL - Handlowych, marketingowych, szkoleniowych",
"P100001020101 - - Handlowych,, szkoleniowych - refaktury",
"- Handlowych, marketingowych,P100001020102, - pozostale"
)
As result I want to get exact match of regular expression
result <- c(
"P10000101",
"P10000102_PL",
"P1000010201_PL",
"P100001020101",
"P100001020102"
)
I tried with this pattern = "([PLA]\\d+)"
and different combinations of value = T, fixed = T, perl = T.
grep(x = string, pattern = "([PLA]\\d+(_PL)?)", fixed = T)
Upvotes: 3
Views: 608
Reputation: 887911
We can try with str_extract
library(stringr)
str_extract(string, "P\\d+(_[A-Z]+)*")
#[1] "P10000101" "P10000102_PL" "P1000010201_PL" "P100001020101" "P100001020102"
grep
is for finding whether the match pattern is present in a particular string or not. For extraction, either use sub
or gregexpr/regmatches
or str_extract
Using the base R
(regexpr/regmatches
)
regmatches(string, regexpr("P\\d+(_[A-Z]+)*", string))
#[1] "P10000101" "P10000102_PL" "P1000010201_PL" "P100001020101" "P100001020102"
Basically, the pattern to match is P
followed by one more numbers (\\d+
) followed by greedy (*
) match of _
and one or more upper case letters.
Upvotes: 6