Split R string at spaces but not when the space is between single quotes

Question

I have and ugly and complex set of strings that I have to split:

vec <- c("'01'", "'01' '02'", 
         "#bateau", "#bateau #batiment",
         "#'autres 32'", "#'autres 32' #'batiment 30'", "#'autres 32' #'batiment 30' #'contenu 31'",
         "#'34'", "#'34' #'33' #'35'")
vec
[1] "'01'"                                      "'01' '02'"                                
[3] "#bateau"                                   "#bateau #batiment"                        
[5] "#'autres 32'"                              "#'autres 32' #'batiment 30'"              
[7] "#'autres 32' #'batiment 30' #'contenu 31'" "#'34'"                                    
[9] "#'34' #'33' #'35'"

I need to split the string everywhere there is a space (), except if the space is between '. So in the example above, '01' '02' would become '01' and '02' while #'autres 32' #'batiment 30' would become #'autres 32' and #'batiment 30'.

I've tried getting inspiration from this question, but didn't get far:

strsplit(vec, "(\s[^']+?)('.*?'|$)")

as this solution split some spaces that shouldn't and make me loose some information as well.

The result from the split should be something like:

res <- c("'01'", "'01'", "'02'", 
         "#bateau", "#bateau", "#batiment",
         "#'autres 32'", "#'autres 32'", "#'batiment 30'", "#'autres 32'", "#'batiment 30'", "#'contenu 31'",
         "#'34'", "#'34'", "#'33'", "#'35'")

What would be the proper regular expression to split this string?

Thanks

Wiktor Stribiżew · Accepted Answer

You may use

strsplit(vec, "'[^']*'(*SKIP)(*F)|\s+", perl=TRUE)

See the R demo and the regex demo online.

Details

'[^']*'(*SKIP)(*F) - ', then any 0+ chars other than ' (see [^']*) and then ', and then this matched text is discarded and the next match is searched for from the position where the current match got failed (see (*SKIP)(*F))
| - or
\s+ - 1+ whitespace chars.

Since it is a PCRE pattern, the perl=TRUE is obligatory.

Split R string at spaces but not when the space is between single quotes

Answers (1)

Related Questions