Reputation: 73
I have a data set with a column that contains a label with the year (OldLabel
), and I want to make another column that contains only the label, not the year (NewLabel
). I wrote the following code, but it leaves a space at the end of the new labels.
data["NewLabel"] <- gsub("20..", "", data$OldLabel)
#removes any part of the OldLabel column that starts with 20 and ends with 2 digits, e.g: 2011 or 2008
Is there a way to have gsub replace the sequence with a backspace, so it gets rid of any spaces around the year it replaces? I tried using "\\b"
as my replacement text, but that just replaced it with b
, not a backspace.
EDIT: Per request, an example of OldLabel
would be "Valley Summer 2014"
, which should become "Valley Summer"
, but ends up being "Valley Summer "
with my current code. However, some might also be of the form 2012 Valley Summer
, so I don't think simply including a space in the pattern would be robust enough.
Upvotes: 1
Views: 456
Reputation: 263331
Try this:
data["NewLabel"] <- gsub("[ ]{0,1}20[[:digit:]]{2}[ ]{0,1}", "", data$OldLabel)
The paired curley-braces are repetition quantifiers that have a range determined by either one (exact) or two (min and max) values. See ?regex
for more details. (You don't want to replace them with backspace characters.)
test <- c("2012 Valley Summer", "Valley Summer 2014")
gsub("[ ]{0,1}20[[:digit:]]{2}[ ]{0,1}", "", test)
#[1] "Valley Summer" "Valley Summer"
Upvotes: 1
Reputation: 2085
data["NewLabel"] <- gsub("\\s*[0-9]\\s*", "", data$OldLabel)
Upvotes: 0