SteveS
SteveS

Reputation: 4040

Extract pattern from filename using regex in R?

I have the following string:

"UNKNOWN_{_requestID___b9b6bcc4-c163-45d7-82d9-423a96cf5fe1_,_deviceID___9c84f871-9e95-45d5-9335-12e7d42b96a0_}_2018-08-15-15-43-01-296_529307b7-6316-4cdc-ab53-2e1158c651c6.txt"

and I want to extract the 529307b7-6316-4cdc-ab53-2e1158c651c6 part (the last part between _ and .txt).

Here is what I am trying to do using regex:

^\_\w\.txt but without luck, I am keep playing with this, please advise what is the strategy and how to "attack" this.

Upvotes: 2

Views: 77

Answers (4)

RavinderSingh13
RavinderSingh13

Reputation: 133508

Could you please try following.

gsub(".*_|\\.txt","",x)

Output will be as follows.

[1] "529307b7-\n6316-4cdc-ab53-2e1158c651c6"

Explanation: Adding following only for explanation purposes.

gsub(     ##Using gsub(Global substitution function of R to perform multiple substitution on variables)
".*_      ##Mentioning REGEX to select everything from starting till _(underscore)
|         ##|(pipe) defines OR so it should match either previous or coing REGEX in varibale's value.
\\.txt"   ##\\. means escaping DOT so that DOT should be treated as a DOT not with its special meaning so it should match string .txt
,""       ##If above mentioned REGEXs any one of them OR both matches then substitute them with "" means NULL.
,x)       ##Mentioning variable named x on which we have to perform gsub.

Where Input variable x's value is as follows.

x <- "UNKNOWN_{_requestID___b9b6bcc4-c163-45d7-82d9-423a96cf5fe1_,_deviceID
___9c84f871-9e95-45d5-9335-12e7d42b96a0_}_2018-08-15-15-43-01-296_529307b7-
6316-4cdc-ab53-2e1158c651c6.txt"

Upvotes: 1

Roman Luštrik
Roman Luštrik

Reputation: 70643

Here's using a hidden gem from tools.

x <- "UNKNOWN_{_requestID___b9b6bcc4-c163-45d7-82d9-423a96cf5fe1_,_deviceID___9c84f871-9e95-45d5-9335-12e7d42b96a0_}_2018-08-15-15-43-01-296_529307b7-6316-4cdc-ab53-2e1158c651c6.txt"

out <- strsplit(x, "_")[[1]]
out <- out[length(out)]
tools::file_path_sans_ext(out)

[1] "529307b7-6316-4cdc-ab53-2e1158c651c6"

Upvotes: 2

Nar
Nar

Reputation: 658

apply 2 times sub:

    text <- c("UNKNOWN_{_requestID___b9b6bcc4-c163-45d7-82d9-423a96cf5fe1_,_deviceID___9c84f871-9e95-45d5-9335-12e7d42b96a0_}_2018-08-15-15-43-01-296_529307b7-6316-4cdc-ab53-2e1158c651c6.txt" )
    sub("\\.txt.*", "", sub(".*\\_", "", text)) 

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You may use

sub("^.*_(.*)\\.txt$", "\\1", x)

See the regex demo

sub will perform a single seasrch and replace operation. It will find a match if the string conforms to the following:

  • ^ start of string
  • .*_ - any 0+ chars, as many as possible, up to the last _
  • (.*) - any 0+ chars (captured into Group 1,later referred to with \1 from the replacement pattern), as many as possible, up to and including...
  • \\.txt$ - .txt (. must be escaped to match a literal dot) at the end of the string ($).

R demo:

x <- "UNKNOWN_{_requestID___b9b6bcc4-c163-45d7-82d9-423a96cf5fe1_,_deviceID___9c84f871-9e95-45d5-9335-12e7d42b96a0_}_2018-08-15-15-43-01-296_529307b7-6316-4cdc-ab53-2e1158c651c6.txt"
sub("^.*_(.*)\\.txt$", "\\1", x)
## => [1] "529307b7-6316-4cdc-ab53-2e1158c651c6"

Upvotes: 3

Related Questions