Reputation: 5129
In R, I have a variable Author, with the value "(Bernoulli)Cuatrec."
I want to have only the names, so I'm using the following regex:
L <- strsplit(Author,"[()]")
but that's giving me 3 strings as result:
"" "Bernoulli" "Cuatrec."
How can I do it to have only the two names, and not the empty string?
PS: My actual regex is more complicated, it's simplified here.
Upvotes: 2
Views: 1164
Reputation: 269654
In the solutions below set rmChars
and splitChars
(for the first solution) and chars
(for the second solution) to a pattern representing the actual sets of characters you need to use. Depending on your words and non-words you might be able to use built in classes such as chars <- "\\W"
which sets chars
to all non-word characters.
1) Remove the ( first and then split on ) . Assuming s
is the input string:
rmChars <- "[(]"
splitChars <- "[)]"
strsplit(gsub(rmChars, "", s), splitChars)[[1]]
giving:
[1] "Bernoulli" "Cuatrec."
2) Another possibility is to replace each character in chars
with a space, trim the ends and then split on space.
chars <- "[()]"
strsplit(trimws(gsub(chars, " ", s)), " ")[[1]]
giving:
[1] "Bernoulli" "Cuatrec."
Upvotes: 3
Reputation: 7190
If your data have always the same pattern, you can just use this:
strsplit(Author,"[[:punct:]]")[[1]][-1]
[1] "Bernoulli" "Cuatrec"
Of course if the pattern is irregular my solution is useless.
Upvotes: 0
Reputation: 5129
I usually tend to avoid installing new libraries, whenever possible. Thus, I can do just:
L <- strsplit(Author,"[()]")[[1]]
L <- L[which(L != "")]
I thought there would be a solution without the need for a library.
Upvotes: 0