ericbrownaustin
ericbrownaustin

Reputation: 1300

Remove spaces between greater than signs

I have a string like this:

foo > bar > foo bar > foo > test > test this

I would like to take any strings within the greater than signs and convert them into single words with no space between them but preserve all other spaces like this:

foo > bar > foobar > foo > test > testthis

I've tried using gsub to remove whitespace gsub(" ", "", x, fixed = TRUE), but I am not sure how to do this only within the greater than signs while preserving the spaces next to the greater than signs

Upvotes: 4

Views: 175

Answers (3)

s_baldur
s_baldur

Reputation: 33488

Here is an (arguably) more tractable solution for the non-regex-expert:

# Split into parts
str2 <- unlist(strsplit(str1, ">"))
str2
[1] "foo "       " bar "      " foo bar "  " foo "      " test "     " test this"

# Eliminate all spaces
str3 <- gsub(" ", "", str2)
str3
[1] "foo"      "bar"      "foobar"   "foo"      "test"     "testthis"

# And now paste again together 
str_final <- paste(str3, collapse = " > ")
str_final
[1] "foo > bar > foobar > foo > test > testthis"

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626861

You may achieve what you want with a pattern that will match and capture > enclosed with whitespaces (using (\s*>\s*)) and matching without capturing all other 1+ whitespace chunks (\s+) - all you need to make the pattern work is to replace with a backreference to Group 1 value (\1):

gsub("(\\s*>\\s*)|\\s+", "\\1", x)

Or, to account for Unicode strings,

gsub("(*UCP)(\\s*>\\s*)|\\s+", "\\1", x, perl=TRUE)

See the regex demo.

Details

  • (\s*>\s*) - Capturing group 1: 0+ whitespaces, >, 0+ whitespaces
  • | - or
  • \s+ - 1+ whitespace chars.

See R demo online:

x <- "foo > bar > foo bar > foo > test > test this"
gsub("(\\s*>\\s*)|\\s+", "\\1", x)
## => [1] "foo > bar > foobar > foo > test > testthis"

Upvotes: 1

akrun
akrun

Reputation: 887128

One option would be a PCRE SKIP/FAIL by matching zero or more space (\\s*) followed by > followed by zero or more spaces (\\s*). At the (*SKIP), it no longer goes back to the right of the match nor retry it. The (*FAIL) forces the pattern to FAIL until the left of (*SKIP) while it matches the space characters (|\\s+) right of the ((*FAIL)) and replace it with blank ("")

gsub("\\s*\\>\\s*(*SKIP)(*FAIL)|\\s+", "", str1, perl = TRUE)
#[1] "foo > bar > foobar > foo > test > testthis"

Or another option is to match space between two word characters. Here the spaces are matched between a positive regex lookbehind word character ((?<=\\w)) and a positive lookahead word character or at the end of the string ((?=\\w|\\$))

gsub("(?<=\\w)\\s(?=\\w|\\$)", "", str1, perl = TRUE)
#[1] "foo > bar > foobar > foo > test > testthis"

Or without using regex lookarounds, we can capture the word

gsub("(\\w)\\s(\\w)", "\\1\\2", str1)
#[1] "foo > bar > foobar > foo > test > testthis"

data

str1 <- "foo > bar > foo bar > foo > test > test this"

Upvotes: 4

Related Questions