messamat
messamat

Reputation: 75

regex fails with dollar sign

In R, I'm trying to match a series of strings from a vector of file names. I only want those without letters that end with .tif

allfiles <- c("181129_16_00_class_mlc.tif", "181129_16_00.tif.aux.xml", "181129_17_00_01_19.tif", "181129_17_00_01_20.tif", "181129_17_00_01_23.tif", "181129_17_00_01_24.tif", "181129_17_00_01_25.tif", "181129_17_00_01_26.tif", "181129_17_00_01_27.tif", "181129_17_00_01_28.tif", "181129_17_00_01_29.tif", "181129_17_00_01_30.tif")

 grepl("^[0-9_]+[.tif]", allfiles)
 grepl("^[0-9_]+[.tif]$", allfiles)

That returns:

[1] FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Why does the dollar sign fail? The result I expected from the second grepl was:

[1] FALSE FALSE TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Upvotes: 1

Views: 108

Answers (1)

Julius Vainora
Julius Vainora

Reputation: 48211

It's not $ what fails but the usage of brackets. Instead you want

grepl("^[0-9_]+\\.tif$", allfiles)
# [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Meanwhile, ^[0-9_]+[.tif]$ means that after all the digits and/or _, at the end you just have t, i, f, or . That is, only one of those. For instance,

grepl("^[0-9_]+[.tif]$", "1234t")
# [1] TRUE
grepl("^[0-9_]+[.tif]$", "1234tt")
# [1] FALSE

Upvotes: 3

Related Questions