Smith Black
Smith Black

Reputation: 535

How to remove extra white space between words inside a character vector using?

Suppose I have a character vector like

"Hi,  this is a   good  time to   start working   together.". 

I just want to have

" Hi, this is a good time to start working together." 

Only one white space between two words. How should I do this in R?

Upvotes: 40

Views: 36993

Answers (4)

LMc
LMc

Reputation: 18622

The package textclean has many useful tools for processing text. replace_white would be useful here:

v <- "Hi,  this is a   good  time to   start working   together."

textclean::replace_white(v)
# [1] "Hi, this is a good time to start working together."

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626748

Since the title of the question is "remove the extra whitespace between words", without touching the leading and trailing whitespaces, the answer is (assuming the "words" are non-whitespace character chunks)

gsub("(\\S)\\s{2,}(?=\\S)", "\\1 ", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)\\s{2,}(?=\\S)", "\\1 ")
## Or, if the whitespace to leep is  the last whitespace in those matched
gsub("(\\S)(\\s){2,}(?=\\S)", "\\1\\2", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)(\\s){2,}(?=\\S)", "\\1\\2")

See regex demo #1 and regex demo #2 and this R demo.

Regex details:

  • (\S) - Capturing group 1 (\1 refers to this group value from the replacement pattern): a non-whitespace char
  • \s{2,} - two or more whitespace chars (in Regex #2, it is wrapped with parentheses to form a capturing group with ID 2 (\2))
  • (?=\S) - a positive lookahead that requires a non-whitespace char immediately to the right of the current location.

Upvotes: 3

Koot6133
Koot6133

Reputation: 1480

Another option is the squish function from the stringr library

library(stringr)
string <- "Hi,  this is a   good  time to   start working   together."
str_squish(string)
#[1] ""Hi, this is a good time to start working together.""

Upvotes: 31

thelatemail
thelatemail

Reputation: 93813

gsub is your friend:

test <- "Hi,  this is a   good  time to   start working   together."
gsub("\\s+"," ",test)
#[1] "Hi, this is a good time to start working together."

\\s+ will match any space character (space, tab etc), or repeats of space characters, and will replace it with a single space " ".

Upvotes: 59

Related Questions