Reputation: 1971
I want to merge multiple spaces into single space(space could be tab also) and remove trailing/leading spaces.
For example...
string <- "Hi buddy what's up Bro"
to
"Hi buddy what's up bro"
I checked the solution given at Regex to replace multiple spaces with a single space. Note that don't put \t or \n as exact space inside the toy string and feed that as pattern in gsub
. I want that in R.
Note that I am unable to put multiple space in toy string. Thanks
Upvotes: 93
Views: 80490
Reputation: 144
This seems to work.
It doesn't eliminate whitespaces at the beginning or the end of the sentence as Rich Scriven's answer
but, it merge multiple whitespices
library("stringr")
string <- "Hi buddy what's up Bro"
str_replace_all(string, "\\s+", " ")
#> str_replace_all(string, "\\s+", " ")
# "Hi buddy what's up Bro"
Upvotes: 0
Reputation: 1161
Or simply try the squish
function from stringr
library(stringr)
string <- " Hi buddy what's up Bro "
str_squish(string)
# [1] "Hi buddy what's up Bro"
Upvotes: 76
Reputation: 803
Another solution using strsplit:
Splitting text into words, and, then, concatenating single words using paste function.
string <- "Hi buddy what's up Bro"
stringsplit <- sapply(strsplit(string, " "), function(x){x[!x ==""]})
paste(stringsplit ,collapse = " ")
For more than one document:
string <- c("Hi buddy what's up Bro"," an example using strsplit ")
stringsplit <- lapply(strsplit(string, " "), function(x){x[!x ==""]})
sapply(stringsplit ,function(d) paste(d,collapse = " "))
Upvotes: 0
Reputation: 184
For this purpose no need to load any extra libraries as the gsub()
of Base r package does the work.
No need to remember those extra libraries.
Remove leading and trailing white spaces with trimws()
and replace the extra white spaces using gsub()
as mentioned by @Adam Erickson.
`string = " Hi buddy what's up Bro "
trimws(gsub("\\s+", " ", string))`
Here \\s+
matches one or more white spaces and gsub
replaces it with single space.
To know what any regular expression is doing, do visit this link as mentioned by @Tyler Rinker.
Just copy and paste the regular expression you want to know what it is doing and this will do the rest.
Upvotes: 0
Reputation: 6363
You do not need to import external libraries to perform such a task:
string <- " Hi buddy what's up Bro "
string <- gsub("\\s+", " ", string)
string <- trimws(string)
string
[1] "Hi buddy what's up Bro"
Or, in one line:
string <- trimws(gsub("\\s+", " ", string))
Much cleaner.
Upvotes: 36
Reputation: 109984
Another approach using a single regex:
gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", string, perl=TRUE)
Explanation (from)
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
[\s] any character of: whitespace (\n, \r,
\t, \f, and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Upvotes: 44
Reputation: 99361
This seems to meet your needs.
string <- " Hi buddy what's up Bro "
library(stringr)
str_replace(gsub("\\s+", " ", str_trim(string)), "B", "b")
# [1] "Hi buddy what's up bro"
Upvotes: 82
Reputation: 109984
The qdapRegex
has the rm_white
function to handle this:
library(qdapRegex)
rm_white(string)
## [1] "Hi buddy what's up Bro"
Upvotes: 6
Reputation: 887571
You could also try clean
from qdap
library(qdap)
library(stringr)
str_trim(clean(string))
#[1] "Hi buddy what's up Bro"
Or as suggested by @Tyler Rinker (using only qdap
)
Trim(clean(string))
#[1] "Hi buddy what's up Bro"
Upvotes: 4