Reputation: 187
I've a column of data from which I need to extract a alphnumeric string/factor example
Column x
[ghjg6] [fdg5] [113gi4lki] great work
[xzswedc: acf] [xzt8] [111eerrh5]
[asd2] [1] [113vu17hg 115er5lgr 112cgnmbh ] get out
I want to get the data in the square brackets [113gi4lki]
, [111eerrh5]
and [113vu17hg 115er5lgr 112cgnmbh]
in a separate column. Please advise.
Upvotes: 0
Views: 2818
Reputation: 627083
To get the text inside the last set of [...]
brackets, you may use a sub
with the following pattern:
".*\\[([^][]+)].*"
The pattern matches:
.*
- any 0+ chars greedily, as many as possible, up to the last occurrence of the subsequent subpatterns\\[
- a literal [
(must be escaped outside of the bracket expression)([^][]+)
- Group 1 (later referred to with \1
) matching 1 or more chars other than ]
and [
]
- a literal ]
(no need escaping it outside of a bracket expression.*
- the rest of the string.x <- c("[ghjg6] [fdg5] [113gi4lki] great work", "[xzswedc: acf] [xzt8] [111eerrh5]", "[asd2] [1] [113vu17hg 115er5lgr 112cgnmbh ] get out", "Some text with no brackets")
df <- data.frame(x)
df$x = sub(".*\\[([^][]+)].*", "\\1", df$x)
df
Output:
x
1 113gi4lki
2 111eerrh5
3 113vu17hg 115er5lgr 112cgnmbh
4 Some text with no brackets
If you want to remove the entries with no [...]
(like the last one in my test set), use
df$x = sub(".*\\[([^][]+)].*|.*", "\\1", df$x)
Upvotes: 2
Reputation: 12569
You can do:
Column.x <- c(
"[ghjg6] [fdg5] [113gi4lki] great work",
"[xzswedc: acf] [xzt8] [111eerrh5]",
"[asd2] [1] [113vu17hg 115er5lgr 112cgnmbh ] get out")
y <- gsub(".*\\[", "[", Column.x)
gsub("\\].*", "]", y)
result:
> gsub("\\].*", "]", y)
[1] "[113gi4lki]" "[111eerrh5]" "[113vu17hg 115er5lgr 112cgnmbh ]"
If you want you can put both steps together:
gsub("\\].*", "]", gsub(".*\\[", "[", Column.x))
Upvotes: 2