Jeremy
Jeremy

Reputation: 348

strsplit returns invisible element

I found a very strange behavior in strsplit(). It's similar to this question, however I would love to know why it is returning an empty element in the first place. Does someone know?

unlist(strsplit("88F5T7F4T13F", "\\d+"))  
[1] ""  "F" "T" "F" "T" "F"

Since I use that string vor reproducing a long logical vector (88*FALSE 5*TRUE 7*FALSE 4*TRUE 13*FALSE) I have to trust it...

Answer unlist(strsplit("88F5T7F4T13F", "\\d+"))[-1] works, but is it robust?

Upvotes: 3

Views: 541

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626861

The empty element appears since there are digits at the start. Since you split at digits, the first split occurs right between start of string and the first F and that empty string at the string start is added to the resulting list.

You may use your own solution since it is already working well. If you are interested in alternative solutions, see below:

unlist(strsplit(sub("^\\d+", "", "88F5T7F4T13F"), "\\d+"))

It makes the empty element in the resulting split disapper since the sub with ^\d+ pattern removes all leading digits (^ is the start of string and \d+ matches 1 or more digits). However, it is not robust, since it uses 2 regexps.

library(stringr)
res = str_extract_all(s, "\\D+")

This only requires one matching regex, \D+ - 1 or more non-digit symbols, and one external library.

If you want to do a similar thing with base R, use regmatches with gregexpr:

regmatches(s, gregexpr("\\D+", s))

Upvotes: 1

Related Questions