Reputation: 1300
I have a string like this:
foo > bar > foo bar > foo > test > test this
I would like to take any strings within the greater than signs and convert them into single words with no space between them but preserve all other spaces like this:
foo > bar > foobar > foo > test > testthis
I've tried using gsub to remove whitespace gsub(" ", "", x, fixed = TRUE)
, but I am not sure how to do this only within the greater than signs while preserving the spaces next to the greater than signs
Upvotes: 4
Views: 175
Reputation: 33488
Here is an (arguably) more tractable solution for the non-regex-expert:
# Split into parts
str2 <- unlist(strsplit(str1, ">"))
str2
[1] "foo " " bar " " foo bar " " foo " " test " " test this"
# Eliminate all spaces
str3 <- gsub(" ", "", str2)
str3
[1] "foo" "bar" "foobar" "foo" "test" "testthis"
# And now paste again together
str_final <- paste(str3, collapse = " > ")
str_final
[1] "foo > bar > foobar > foo > test > testthis"
Upvotes: 0
Reputation: 626861
You may achieve what you want with a pattern that will match and capture >
enclosed with whitespaces (using (\s*>\s*)
) and matching without capturing all other 1+ whitespace chunks (\s+
) - all you need to make the pattern work is to replace with a backreference to Group 1 value (\1
):
gsub("(\\s*>\\s*)|\\s+", "\\1", x)
Or, to account for Unicode strings,
gsub("(*UCP)(\\s*>\\s*)|\\s+", "\\1", x, perl=TRUE)
See the regex demo.
Details
(\s*>\s*)
- Capturing group 1: 0+ whitespaces, >
, 0+ whitespaces|
- or\s+
- 1+ whitespace chars.See R demo online:
x <- "foo > bar > foo bar > foo > test > test this"
gsub("(\\s*>\\s*)|\\s+", "\\1", x)
## => [1] "foo > bar > foobar > foo > test > testthis"
Upvotes: 1
Reputation: 887128
One option would be a PCRE SKIP/FAIL
by matching zero or more space (\\s*
) followed by >
followed by zero or more spaces (\\s*
). At the (*SKIP)
, it no longer goes back to the right of the match nor retry it. The (*FAIL)
forces the pattern to FAIL until the left of (*SKIP) while it matches the space characters (|\\s+
) right of the ((*FAIL)
) and replace it with blank (""
)
gsub("\\s*\\>\\s*(*SKIP)(*FAIL)|\\s+", "", str1, perl = TRUE)
#[1] "foo > bar > foobar > foo > test > testthis"
Or another option is to match space between two word characters. Here the spaces are matched between a positive regex lookbehind word character ((?<=\\w)
) and a positive lookahead word character or at the end of the string ((?=\\w|\\$)
)
gsub("(?<=\\w)\\s(?=\\w|\\$)", "", str1, perl = TRUE)
#[1] "foo > bar > foobar > foo > test > testthis"
Or without using regex lookarounds, we can capture the word
gsub("(\\w)\\s(\\w)", "\\1\\2", str1)
#[1] "foo > bar > foobar > foo > test > testthis"
str1 <- "foo > bar > foo bar > foo > test > test this"
Upvotes: 4