giraffehere
giraffehere

Reputation: 1148

R Match And Sub On Space Between Specific Characters

I need a little help with a regular expression using gsub. Take this object:

x <- "4929A 939 8229"

I want to remove the space in between "A" and "9", but I am not sure how to match on only the space between them and not on the second space. I essentially need something like this:

x <- gsub("A 9", "", x)

But I am not sure how to write the regular expression to not match on the "A" and "9" and only the space between them.

Thanks in advance!

Upvotes: 0

Views: 53

Answers (2)

Rentrop
Rentrop

Reputation: 21497

gsub matches/uses all regex found whereas sub only matches/uses the first one. So

sub(" ", "", "4929A 939 8229") # returns "4929A939 8229"

Will do the job

Removing second/nth occurence

You can do that e.g. by using strsplit as follows:

x <- c("4929A 939 8229", "4929A 9398229")

collapse_nth <- function(x_split, split, nth, replacement){
  left <- paste(x_split[seq_len(nth)], collapse = split)
  right <- paste(x_split[-seq_len(nth)], collapse = split)
  paste(left, right, sep = replacement)
}

remove_nth <- function(x, nth, split, replacement = ""){
  x_split <- strsplit(x, split, fixed = TRUE)
  x_len <- vapply(x_split, length, integer(1))
  out <- x
  out[x_len>nth] <- vapply(x_split[x_len>nth], collapse_nth, character(1), split, nth, replacement)
  out
}

Which gives you:

# > remove_nth(x, 2, " ")
# [1] "4929A 9398229" "4929A 9398229"

and

# > remove_nth(x, 2, " ", "---")
# [1] "4929A 939---8229" "4929A 9398229" 

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

You may use the following regex in sub:

> x <- "4929A 939 8229"
> sub("\\s+", "", x)
[1] "4929A939 8229"

The \\s+ will match 1 or more whitespace symbols.

The replacement part is an empty string.

See the online R demo

Upvotes: 2

Related Questions