twieg
twieg

Reputation: 73

gsub replace pattern with backspace

I have a data set with a column that contains a label with the year (OldLabel), and I want to make another column that contains only the label, not the year (NewLabel). I wrote the following code, but it leaves a space at the end of the new labels.

data["NewLabel"] <- gsub("20..", "", data$OldLabel)
#removes any part of the OldLabel column that starts with 20 and ends with 2 digits, e.g: 2011 or 2008

Is there a way to have gsub replace the sequence with a backspace, so it gets rid of any spaces around the year it replaces? I tried using "\\b" as my replacement text, but that just replaced it with b, not a backspace.

EDIT: Per request, an example of OldLabel would be "Valley Summer 2014", which should become "Valley Summer", but ends up being "Valley Summer " with my current code. However, some might also be of the form 2012 Valley Summer, so I don't think simply including a space in the pattern would be robust enough.

Upvotes: 1

Views: 456

Answers (2)

IRTFM
IRTFM

Reputation: 263331

Try this:

 data["NewLabel"] <- gsub("[ ]{0,1}20[[:digit:]]{2}[ ]{0,1}", "", data$OldLabel)

The paired curley-braces are repetition quantifiers that have a range determined by either one (exact) or two (min and max) values. See ?regex for more details. (You don't want to replace them with backspace characters.)

test <- c("2012 Valley Summer", "Valley Summer 2014")
gsub("[ ]{0,1}20[[:digit:]]{2}[ ]{0,1}", "", test)
#[1] "Valley Summer" "Valley Summer"

Upvotes: 1

AidanGawronski
AidanGawronski

Reputation: 2085

data["NewLabel"] <- gsub("\\s*[0-9]\\s*", "", data$OldLabel)

Upvotes: 0

Related Questions