Alex
Alex

Reputation: 15708

Match exactly one occurrence and not consecutive occurrences

I have a file name with directory path returned from list.files(..., full.names = T). I want to split the file name up by / to find the directory structure. I am having trouble only identifying single occurrences of /, e.g.

strsplit("C://dir1/dir2/txt.R", "/")
# [[1]]
# [1] "C:"    ""      "dir1"  "dir2"  "txt.R"

when I desire the output to be:

[1] "C://"  "dir1"  "dir2"  "txt.R"

I was looking at this answer that seems to give a regex answer, however, I get an error when I try to get a 'literal' match:

> strsplit("C://dir1/dir2/txt.R", "\/")
Error: '\/' is an unrecognized escape in character string starting ""\/"

In fact, the regex in that example does not work in R:

> grepl('([\w\/]+)\/amp(\/\w+[-\/]\w+\/?)', '/name/amp/test-123')
Error: '\w' is an unrecognized escape in character string starting "'([\w"

Upvotes: 0

Views: 114

Answers (4)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

A very simple matching approach would be

x <- "C://dir1/dir2/txt.R"
regmatches(x, gregexpr("[^/]+(?://)?", x))
#  or with stringr
str_extract_all(x, "[^/]+(?://)?")
# [[1]]
# [1] "C://"  "dir1"  "dir2"  "txt.R"

See the regex demo and the R online demo.

Pattern details

  • [^/]+ - 1 or more chars other than /
  • (?://)? - an optional sequence of two /.

Note that in case you want to ignore // inside the path and only grab them in the beginning, you may add an alternative like ^[[:alpha:]]:// or a lookbehind (?<=^[[:alpha:]]:) to the optional group:

regmatches(x, gregexpr("[^/]+(?:(?<=^[[:alpha:]]:)//)?", x, perl=TRUE))
# or
regmatches(x, gregexpr("^[[:alpha:]]://|[^/]+", x))

See this and that regex demo.

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174696

KISS,

strsplit("C://dir1/dir2/txt.R", "\\b/\\b|(?<=//)", perl = TRUE)[[1]]
# [1] "C://"  "dir1"  "dir2"  "txt.R"

Upvotes: 2

Gurmanjot Singh
Gurmanjot Singh

Reputation: 10360

Try this code:

strsplit("C://dir1/dir2/txt.R", "(?<=//)|(?<!/)/(?!/)", perl=TRUE)

See output here

Explanation:

  • (?<=//) - finds the position immediately preceded by a //
  • | - OR
  • (?<!/)/(?!/) - matches a / which is neither preceded by a / nor followed by a /

Regex Demo

Upvotes: 3

akrun
akrun

Reputation: 887028

One option would be to match more than one occurence of / and SKIP it while splitting on the single / or the word boundary that succeeds after the /

strsplit("C://dir1/dir2/txt.R", "[/]{2,}(*SKIP)(*F)|\\b[/]|(?<=[/])\\b", perl = TRUE)[[1]]
#[1] "C://"  "dir1"  "dir2"  "txt.R"

Upvotes: 2

Related Questions