Rodrigo
Rodrigo

Reputation: 5129

strsplit returns empty string with regex

In R, I have a variable Author, with the value "(Bernoulli)Cuatrec."

I want to have only the names, so I'm using the following regex:

L <- strsplit(Author,"[()]")

but that's giving me 3 strings as result:

""          "Bernoulli" "Cuatrec."

How can I do it to have only the two names, and not the empty string?

PS: My actual regex is more complicated, it's simplified here.

Upvotes: 2

Views: 1164

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269654

In the solutions below set rmChars and splitChars (for the first solution) and chars (for the second solution) to a pattern representing the actual sets of characters you need to use. Depending on your words and non-words you might be able to use built in classes such as chars <- "\\W" which sets chars to all non-word characters.

1) Remove the ( first and then split on ) . Assuming s is the input string:

rmChars <- "[(]"
splitChars <- "[)]"
strsplit(gsub(rmChars, "", s), splitChars)[[1]]

giving:

[1] "Bernoulli" "Cuatrec." 

2) Another possibility is to replace each character in chars with a space, trim the ends and then split on space.

chars <- "[()]"
strsplit(trimws(gsub(chars, " ", s)), " ")[[1]]

giving:

[1] "Bernoulli" "Cuatrec." 

Upvotes: 3

SabDeM
SabDeM

Reputation: 7190

If your data have always the same pattern, you can just use this:

strsplit(Author,"[[:punct:]]")[[1]][-1]
[1] "Bernoulli" "Cuatrec"  

Of course if the pattern is irregular my solution is useless.

Upvotes: 0

Rodrigo
Rodrigo

Reputation: 5129

I usually tend to avoid installing new libraries, whenever possible. Thus, I can do just:

L <- strsplit(Author,"[()]")[[1]]
L <- L[which(L != "")]

I thought there would be a solution without the need for a library.

Upvotes: 0

Related Questions