kishore
kishore

Reputation: 187

Extract last substring between square brackets

I've a column of data from which I need to extract a alphnumeric string/factor example

Column x
[ghjg6] [fdg5] [113gi4lki] great work 
[xzswedc: acf] [xzt8] [111eerrh5] 
[asd2] [1] [113vu17hg 115er5lgr 112cgnmbh ] get out

I want to get the data in the square brackets [113gi4lki], [111eerrh5] and [113vu17hg 115er5lgr 112cgnmbh] in a separate column. Please advise.

Upvotes: 0

Views: 2818

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627083

To get the text inside the last set of [...] brackets, you may use a sub with the following pattern:

".*\\[([^][]+)].*"

The pattern matches:

  • .* - any 0+ chars greedily, as many as possible, up to the last occurrence of the subsequent subpatterns
  • \\[ - a literal [ (must be escaped outside of the bracket expression)
  • ([^][]+) - Group 1 (later referred to with \1) matching 1 or more chars other than ] and [
  • ] - a literal ] (no need escaping it outside of a bracket expression
  • .* - the rest of the string.

R online demo:

x <- c("[ghjg6] [fdg5] [113gi4lki] great work", "[xzswedc: acf] [xzt8] [111eerrh5]", "[asd2] [1] [113vu17hg 115er5lgr 112cgnmbh ] get out", "Some text with no brackets")
df <- data.frame(x)
df$x = sub(".*\\[([^][]+)].*", "\\1", df$x)
df

Output:

                               x
1                      113gi4lki
2                      111eerrh5
3 113vu17hg 115er5lgr 112cgnmbh 
4     Some text with no brackets

If you want to remove the entries with no [...] (like the last one in my test set), use

df$x = sub(".*\\[([^][]+)].*|.*", "\\1", df$x)

See another online R demo.

Upvotes: 2

jogo
jogo

Reputation: 12569

You can do:

Column.x <- c(
"[ghjg6] [fdg5] [113gi4lki] great work",
"[xzswedc: acf] [xzt8] [111eerrh5]",
"[asd2] [1] [113vu17hg 115er5lgr 112cgnmbh ] get out")
y <- gsub(".*\\[", "[", Column.x)
gsub("\\].*", "]", y)

result:

> gsub("\\].*", "]", y)
[1] "[113gi4lki]"                      "[111eerrh5]"                      "[113vu17hg 115er5lgr 112cgnmbh ]"

If you want you can put both steps together:

gsub("\\].*", "]", gsub(".*\\[", "[", Column.x))

Upvotes: 2

Related Questions