val
val

Reputation: 1699

using regex in tidyverse to filter - unclear why this does not work

I'm trying to filter rows containing the string "Data\\this\\way\\test". Unclear as to why this does not work. Ideally would expect to see an output like

  "D:\\Data\\this\\way\\test\\dat1",
  "D:\\Data\\this\\way\\test\\dat2",

My code:

files <- c(
  "D:\\Data\\this\\way\\test\\dat1",
  "D:\\Data\\this\\way\\test\\dat2",
  "D:\\Data\\not-this\\way\\test\\dat1",
  "D:\\Data\\not-this\\way\\test\\dat2"
)

files_filt_df <- data.frame(filenames = files, 
                             stringsAsFactors = FALSE) %>%
  filter(str_detect(filenames,"Data\\this\\way\\test"))
files_filt_df
[1] filenames
<0 rows> (or 0-length row.names)

Upvotes: 1

Views: 457

Answers (3)

hello_friend
hello_friend

Reputation: 5788

Base R solution (find the literal pattern in files, return all values matching pattern):

data.frame(files = grep("\\Data\\this\\way\\", files, value = T, fixed = T), stringsAsFactors = F)

Upvotes: 0

dipetkov
dipetkov

Reputation: 3690

Since these are filenames, you can also use the fs package to check is a file has a particular parent and let fs deal with the file separators.

library("tidyverse")

files <- c(
  "D:\\Data\\this\\way\\test\\dat1",
  "D:\\Data\\this\\way\\test\\dat2",
  "D:\\Data\\not-this\\way\\test\\dat1",
  "D:\\Data\\not-this\\way\\test\\dat2"
)

tibble(
  file = files
) %>%
  filter(map_lgl(file, ~ fs::path_has_parent(., "D:/Data/this/way")))
#> # A tibble: 2 x 1
#>   file                             
#>   <chr>                            
#> 1 "D:\\Data\\this\\way\\test\\dat1"
#> 2 "D:\\Data\\this\\way\\test\\dat2"

# Explanation:

# The `map_lgl` applies `fs::path_has_parent` to each file
# and returns TRUE/FALSE (logical = lgl) values.

# Without `map`:
fs::path_has_parent(files, "D:/Data/this/way")
#> [1] FALSE

# With `map`:
map_lgl(files, ~ fs::path_has_parent(., "D:/Data/this/way"))
#> [1]  TRUE  TRUE FALSE FALSE

# The `~` operator creates a formula.
# Here it is shorter than defining an inline function.

# Formula:
map_lgl(files, ~ fs::path_has_parent(., "D:/Data/this/way"))
#> [1]  TRUE  TRUE FALSE FALSE

# Function:
map_lgl(files, function(x) fs::path_has_parent(x, "D:/Data/this/way"))
#> [1]  TRUE  TRUE FALSE FALSE

Created on 2019-11-02 by the reprex package (v0.3.0)

Upvotes: 2

MrFlick
MrFlick

Reputation: 206197

By default str_detect expects you to pass a regular expression. Things like \w have special meaning in regular expressions. If you just want to match a literal value, the easiest way would be

files_filt_df <- data.frame(filenames = files, 
                            stringsAsFactors = FALSE) %>%
  filter(str_detect(filenames,fixed("Data\\this\\way\\test")))

Or if you want to use a regular expression, you need to add an additional level of escaping on the slashes

files_filt_df <- data.frame(filenames = files, 
                            stringsAsFactors = FALSE) %>%
  filter(str_detect(filenames,"Data\\\\this\\\\way\\\\test"))

Upvotes: 2

Related Questions