Leosar
Leosar

Reputation: 2072

how to extract 2 chars after the third occurrence of "_" using regex in R

I wanted to extract some info from file names using regex, from this vector of strings

ss <-c("africa_AF_1_20_perc_threshold_in_MOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif_Patch_areas","africa_AF_1_25_perc_threshold_in_MOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif_Patch_areas","africa_AF_1_30_perc_thresholdinMOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif")

I want to extract the numbers after the third "_", I tried this

gsub("(?:.*?_){3}([^_]+)","\\1",ss)

I tested the expression using https://regex101.com/r/6QqHwf/6 and it is correct, the output should be 20, 25, 30 but I obtain

[1] "areas"     "areas"     "Cover.tif

Upvotes: 1

Views: 182

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627101

Use the caret ^ to make sure you match at the start of the string and also make sure you match the whole string with .*at the end of the pattern:

ss <-c("africa_AF_1_20_perc_threshold_in_MOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif_Patch_areas","africa_AF_1_25_perc_threshold_in_MOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif_Patch_areas","africa_AF_1_30_perc_thresholdinMOD44B.MRTWEB.A2000065.051.Percent_Tree_Cover.tif")
sub("^(?:[^_]*_){3}([^_]+).*", "\\1", ss)
## => [1] "20" "25" "30"

See the R demo. Note you do not need gsub, since you only want to perform a single search and replace operation, a sub will do.

Details

  • ^ - start of string
  • (?:[^_]*_){3} - 3 occurrences of
    • [^_]* - zero or more chars other than _
    • _ - an underscore
  • ([^_]+) - Group 1: one or more chars other than _
  • .* - the rest of the string.

The \1 is the replacement pattern that inserts the value captured in Group 1.

Upvotes: 2

Related Questions