strsplit returns invisible element

Question

I found a very strange behavior in strsplit(). It's similar to this question, however I would love to know why it is returning an empty element in the first place. Does someone know?

unlist(strsplit("88F5T7F4T13F", "\d+"))  
[1] ""  "F" "T" "F" "T" "F"

Since I use that string vor reproducing a long logical vector (88*FALSE 5*TRUE 7*FALSE 4*TRUE 13*FALSE) I have to trust it...

Answer unlist(strsplit("88F5T7F4T13F", "\d+"))[-1] works, but is it robust?

Wiktor Stribiżew · Accepted Answer

The empty element appears since there are digits at the start. Since you split at digits, the first split occurs right between start of string and the first F and that empty string at the string start is added to the resulting list.

You may use your own solution since it is already working well. If you are interested in alternative solutions, see below:

unlist(strsplit(sub("^\d+", "", "88F5T7F4T13F"), "\d+"))

It makes the empty element in the resulting split disapper since the sub with ^\d+ pattern removes all leading digits (^ is the start of string and \d+ matches 1 or more digits). However, it is not robust, since it uses 2 regexps.

library(stringr)
res = str_extract_all(s, "\D+")

This only requires one matching regex, \D+ - 1 or more non-digit symbols, and one external library.

If you want to do a similar thing with base R, use regmatches with gregexpr:

regmatches(s, gregexpr("\D+", s))

strsplit returns invisible element

Answers (1)

Related Questions