David VR
David VR

Reputation: 11

Extract characters between specified characters in R

I have this variable x= "379_exp_mirror1.csv" I need to extract the number ("379") at the beggining (which doesn't always have 3 characters), i.e. everything before the first "". And then I need to extract everything between the second "" and the ".", in this case "mirror1".

I have tried several combinations with sub and gsub with no success, can anyone give me some indications please?

Thank you

Upvotes: 1

Views: 2169

Answers (3)

akrun
akrun

Reputation: 887118

May be you can try:

 library(stringr)
 x <- "379_exp_mirror1.csv" 
 str_extract_all(x, perl('^[0-9]+(?=_)|[[:alnum:]]+(?=\\.)'))[[1]]
 #[1] "379"     "mirror1"

Or

   strsplit(x, "[._]")[[1]][c(T,F)]
   #[1] "379"     "mirror1"

Or

    scan(text=gsub("[.]","_", x),what="",sep="_")[c(T,F)]
   #Read 4 items
   #[1] "379"     "mirror1"

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81693

You can use sub to extract the substrings:

x <- "379_exp_mirror1.csv" 

sub("_.*", "", x)
# [1] "379"

sub("^(?:.*_){2}(.*?)\\..*", "\\1", x)
# [1] "mirror1"

Another approach with gregexpr:

regmatches(x, gregexpr("^.*?(?=_)|(?<=_)[^_]*?(?=\\.)", x, perl = TRUE))[[1]]
# [1] "379"     "mirror1"

Upvotes: 1

user3859852
user3859852

Reputation: 29

You can use regular expression. For your problem ^(?<Number>[0-9]*)_.* do the job

1/ Test your regular expression with this website : http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

Or you can split string with underscore and then try parse (int.TryParse). I think the second is better but if you want to be a regular expression master try the first method

Upvotes: 1

Related Questions