Alex
Alex

Reputation: 65

How to put a space in between a list of strings?

This is my current dataset:

c("Jetstar","Qantas", "QantasLink","RegionalExpress","TigerairAustralia", 
   "VirginAustralia","VirginAustraliaRegionalAirlines","AllAirlines", 
   "Qantas-allQFdesignatedservices","VirginAustralia-allVAdesignatedservices")

I want to add a space in between airlines name and separate it with space.

For this i tried this code:

airlines$airline <- gsub("([[:lower:]]) ([[:upper:]])", "\\1 \\2", airlines$airline)

But I got the text in the same format as before.

My desired output is as below:

enter image description here

Upvotes: 1

Views: 430

Answers (3)

IRTFM
IRTFM

Reputation: 263331

txt <- c("Jetstar","Qantas", "QantasLink","RegionalExpress","TigerairAustralia", 
"VirginAustralia","VirginAustraliaRegionalAirlines","AllAirlines", 
"Qantas-allQFdesignatedservices","VirginAustralia-allVAdesignatedservices")

You need two different sorts of rules: one for the spaces before the case changes and the other for recurring words ("designated", "services") or symbols ("-"). You could start with a pattern that identified a lowercase character followed by an uppercase character (identified with a character class like "[A-Z]") and then insert a space between those two characters in two capture classes (created with flanking parentheses around a section of a pattern). See the ?regex Details section for a quick description of character classes and capture classes:

gsub("([a-z])([A-Z])", "\\1 \\2", txt)

You then use that result as an argument that adds a space before any of the recurring words in your text that you want also separated:

gsub("(-|all|designated|services)", " \\1", # second pattern and sub for "specials"
gsub("([a-z])([A-Z])", "\\1 \\2", txt))  #first pattern and sub for case changes

 [1] "Jetstar"                                      
 [2] "Qantas"                                       
 [3] "Qantas Link"                                  
 [4] "Regional Express"                             
 [5] "Tigerair Australia"                           
 [6] "Virgin Australia"                             
 [7] "Virgin Australia Regional Airlines"           
 [8] "All Airlines"                                 
 [9] "Qantas - all QF designated services"          
[10] "Virgin Australia - all VA designated services"

I see that someone upvoted my earlier answer to Splitting CamelCase in R which was similar, but this one had a few more wrinkles to iron out.

Upvotes: 3

Lunalo John
Lunalo John

Reputation: 325

I have tried to figure it out and I have come up with something:

library(stringr)

data_vec<- c("Jetstar","Qantas", "QantasLink","RegionalExpress","TigerairAustralia", 
  "VirginAustralia","VirginAustraliaRegionalAirlines","AllAirlines", 
  "Qantas-allQFdesignatedservices","VirginAustralia-allVAdesignatedservices")


str_trim(gsub("(?<=[A-Z]{2})([a-z]{1})", " \\1",gsub("([A-Z]{1,2})", " \\1", data_vec)))

I Hope this helps.

Upvotes: 1

GWD
GWD

Reputation: 1464

This could (almost) do the trick

gsub("([A-Z])", " \\1", airlines) 

Borrowed from: splitting-camelcase-in-r

Of course names like Qantas-allQFd… will stil pose a problem because of the two consecutive UpperCase letters ("QF") in the second part of the string.

Upvotes: 1

Related Questions