Reputation: 2072
I wanted to extract some info from file names using regex, from this vector of strings
ss <-c("africa_AF_1_20_perc_threshold_in_MOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif_Patch_areas","africa_AF_1_25_perc_threshold_in_MOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif_Patch_areas","africa_AF_1_30_perc_thresholdinMOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif")
I want to extract the numbers after the third "_", I tried this
gsub("(?:.*?_){3}([^_]+)","\\1",ss)
I tested the expression using https://regex101.com/r/6QqHwf/6 and it is correct, the output should be 20, 25, 30 but I obtain
[1] "areas" "areas" "Cover.tif
Upvotes: 1
Views: 182
Reputation: 627101
Use the caret ^
to make sure you match at the start of the string and also make sure you match the whole string with .*
at the end of the pattern:
ss <-c("africa_AF_1_20_perc_threshold_in_MOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif_Patch_areas","africa_AF_1_25_perc_threshold_in_MOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif_Patch_areas","africa_AF_1_30_perc_thresholdinMOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif")
sub("^(?:[^_]*_){3}([^_]+).*", "\\1", ss)
## => [1] "20" "25" "30"
See the R demo. Note you do not need gsub
, since you only want to perform a single search and replace operation, a sub
will do.
Details
^
- start of string(?:[^_]*_){3}
- 3 occurrences of
[^_]*
- zero or more chars other than _
_
- an underscore([^_]+)
- Group 1: one or more chars other than _
.*
- the rest of the string.The \1
is the replacement pattern that inserts the value captured in Group 1.
Upvotes: 2