baxx
baxx

Reputation: 4695

How to extract a certain part of a string in R using regular expressions

How to convert the following string in R :

this_isastring_12(=32)

so that only the following is kept

isastring_12

Eg

f('this_isastring_12(=32)') returns 'isastring_12'

This should work on other strings with a similar structure, but different characters

Another example with a different string of similar structure

f('something_here_3(=1)') returns 'here_3'

Upvotes: 0

Views: 171

Answers (2)

moodymudskipper
moodymudskipper

Reputation: 47300

You could use the package unglue.

Borrowing Ronak's data :

x <- c("this_isastring_12(=32)", "something_here_3(=1)", "another_string_4(=1)")
library(unglue)
unglue_vec(x, "{=.*?}_{res}({=.*?})")
#> [1] "isastring_12" "here_3"       "string_4" 
  • {=.*?} matches anything until what's next is matched, but doesn't extract anything because there's no lhs to the equality
  • {res}, where the name res could be replaced by anything, matches anything, and extracts it
  • outside of curly braces, no need to escape characters
  • unglue_vec() returns an atomic vector of the matches

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

We can use sub to extract everything from first underscore to opening round bracket in the text.

sub(".*?_(.*)\\(.*", "\\1", x)
#[1] "isastring_12" "here_3"       "string_4"    

where x is

x <- c("this_isastring_12(=32)", "something_here_3(=1)", "another_string_4(=1)")

Upvotes: 1

Related Questions