Reputation: 19803
I have strings of the following variety:
A B C Company
XYZ Inc
S & K Co
I would like to remove the spaces in these strings that are only between words of 1 letter length. For example, in the first string I would like to remove the spaces between A
B
and C
but not between C
and Company. The result should be:
ABC Company
XYZ Inc
S&K Co
What is the proper regex expression to use in gsub
for this?
Upvotes: 14
Views: 2862
Reputation: 99331
Obligatory strsplit
/ paste
answer. This will also get those single characters that might be in the middle or at the end of the string.
x <- c('A B C Company', 'XYZ Inc', 'S & K Co',
'A B C D E F G Company', 'Company A B C', 'Co A B C mpany')
foo <- function(x) {
x[nchar(x) == 1L] <- paste(x[nchar(x) == 1L], collapse = "")
paste(unique(x), collapse = " ")
}
vapply(strsplit(x, " "), foo, character(1L))
# [1] "ABC Company" "XYZ Inc" "S&K Co"
# [4] "ABCDEFG Company" "Company ABC" "Co ABC mpany"
Upvotes: 10
Reputation: 174696
You could do this also through PCRE verb (*SKIP)(*F)
> x <- c('A B C Company', 'XYZ Inc', 'S & K Co', 'A B C D E F G Company', ' H & K')
> gsub("\\s*\\S\\S+\\s*(*SKIP)(*F)|(?<=\\S)\\s+(?=\\S)", "", x, perl=TRUE)
[1] "ABC Company" "XYZ Inc" "S&K Co" "ABCDEFG Company"
[5] " H&K"
Explanation:
\\s*\\S\\S+\\s*
Would match two or more non-space characters along with the preceding and following spaces.(*SKIP)(*F)
Causes the match the to fail.|
Now ready to choose the characters from the remaining string.(?<=\\S)\\s+(?=\\S)
one or more spaces which are preceded by a non-space , followed by a non-space character are matched.Note: See the last element, this regex won't replace the preceding spaces at the first because the spaces at the start isn't preceded by a single non-space character.
Upvotes: 0
Reputation: 70722
Here is one way you could do this seeing how &
is mixed in and not a word character ...
x <- c('A B C Company', 'XYZ Inc', 'S & K Co', 'A B C D E F G Company')
gsub('(?<!\\S\\S)\\s+(?=\\S(?!\\S))', '', x, perl=TRUE)
# [1] "ABC Company" "XYZ Inc" "S&K Co" "ABCDEFG Company"
Explanation:
First we assert that two non-whitespace characters do not precede back to back. Then we look for and match whitespace "one or more" times. Next we lookahead to assert that a non-whitespace character follows while asserting that the next character is not a non-whitespace character.
(?<! # look behind to see if there is not:
\S # non-whitespace (all but \n, \r, \t, \f, and " ")
\S # non-whitespace (all but \n, \r, \t, \f, and " ")
) # end of look-behind
\s+ # whitespace (\n, \r, \t, \f, and " ") (1 or more times)
(?= # look ahead to see if there is:
\S # non-whitespace (all but \n, \r, \t, \f, and " ")
(?! # look ahead to see if there is not:
\S # non-whitespace (all but \n, \r, \t, \f, and " ")
) # end of look-ahead
) # end of look-ahead
Upvotes: 19
Reputation: 7948
Coming late to the game but would this pattern work for you
(?<!\\S\\S)\\s+(?!\\S\\S)
Upvotes: 7